Repository navigation

#

data-cleaning

Jupyter notebook and datasets from the pandas video series

Jupyter Notebook
2242
2 年前
R
1427
9 个月前

Easy Data Preparation with latest LLMs-based Operators and Pipelines.

Python
1364
4 天前

An open-source educational chat model from ICALK, East China Normal University. 开源中英教育对话大模型。(通用基座模型,GPU部署,数据清理) 致敬: LLaMA, MOSS, BELLE, Ziya, vLLM

Jupyter Notebook
858
3 个月前

Easy to use Python library of customized functions for cleaning and analyzing data.

Python
521
4 天前

Schema-Inspector is a simple JavaScript object sanitization and validation module.

JavaScript
503
10 个月前

The toolkit to test, validate, and evaluate your models and surface, curate, and prioritize the most valuable data for labeling.

Python
453
4 个月前

Professional data validation for the R environment

R
426
3 个月前

Desbordante is a high-performance data profiler that is capable of discovering many different patterns in data using various algorithms. It also allows to run data cleaning scenarios using these algorithms. Desbordante has a console version and an easy-to-use web application.

C++
424
4 天前

Deal with bad samples in your dataset dynamically, use Transforms as Filters, and more!

Python
379
3 年前

A comprehensive guide to building a modern data warehouse with SQL Server, including ETL processes, data modeling, and analytics.

TSQL
356
5 个月前