Repository navigation
data-cleaning
- Website
- Wikipedia
The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
Refine high-quality datasets and visual AI models
Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON
A light-weight, flexible, and expressive statistical data testing library
Jupyter notebook and datasets from the pandas video series
General Assembly's 2015 Data Science course in Washington, DC
🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
simple tools for data cleaning in R
Machine learning with dataframes
The JavaScript data transformation and analysis toolkit inspired by Pandas and LINQ.
An open-source educational chat model from ICALK, East China Normal University. 开源中英教育对话大模型。(通用基座模型,GPU部署,数据清理) 致敬: LLaMA, MOSS, BELLE, Ziya, vLLM
Easy to use Python library of customized functions for cleaning and analyzing data.
Schema-Inspector is a simple JavaScript object sanitization and validation module.
The toolkit to test, validate, and evaluate your models and surface, curate, and prioritize the most valuable data for labeling.
Professional data validation for the R environment
Desbordante is a high-performance data profiler that is capable of discovering many different patterns in data using various algorithms. It also allows to run data cleaning scenarios using these algorithms. Desbordante has a console version and an easy-to-use web application.
🗣️ A book and repo to get you started programming voice computing applications in Python (10 chapters and 200+ scripts).
Deal with bad samples in your dataset dynamically, use Transforms as Filters, and more!
Data Science Feature Engineering and Selection Tutorials
Exploratory data analysis 📊using python 🐍of used car 🚘 database taken from ⓚ𝖆𝖌𝖌𝖑𝖊