Repository navigation

#

data-curation

fastdup is a powerful, free tool designed to rapidly generate valuable insights from image and video datasets. It helps enhance the quality of both images and labels, while significantly reducing data operation costs, all with unmatched scalability.

Python
1673
3 个月前

Metamapper is a data discovery and documentation platform for improving how teams understand and interact with their data.

Python
79
2 天前

[ICLR 2025] Improving Data Efficiency via Curating LLM-Driven Rating Systems

Python
75
1 个月前

A library for detecting problematic data segments in structured and unstructured data with few lines of code.

Python
64
1 年前

Learn2Clean: Optimizing the Sequence of Tasks for Data Preparation and Cleaning

Python
51
2 年前

Lesson guide and textbook for "Data as a Science" course.

Jupyter Notebook
47
4 年前

A tool for downloading from public image boards (which allow scraping) / preview your images & tags / edit your images & tags. Additional tabs for downloading other desired code repositories as well as S.O.T.A. diffusion and auto-tag/caption models for your purposes. Custom datasets can be added!

Python
37
3 个月前

🧼🔎 A holistic self-supervised data cleaning strategy to detect irrelevant samples, near duplicates and label errors (NeurIPS'24).

Python
31
1 个月前

Target-oriented Proactive Dialogue Systems with Personalization: Problem Formulation and Dataset Curation (EMNLP 2023)

Python
30
1 年前

A web service for semi-automated conversion of raw imaging data to BIDS

Vue
30
10 天前

Curation of BIDS (CuBIDS): A sanity-preserving software package for processing BIDS datasets.

Python
25
3 天前

Rebalancing chemical reaction

Python
21
1 个月前