Repository navigation

#

data-centric-ai

Resources for Data Centric AI

TeX
1127
2 年前

Lab assignments for Introduction to Data-Centric AI, MIT IAP 2024 👩🏽‍💻

Jupyter Notebook
473
7 个月前

Official PyTorch implementation of the paper "Dataset Distillation with Neural Characteristic Function: A Minmax Perspective" (NCFM) in CVPR 2025 (Highlight).

Python
389
11 天前

[ICML 2025] Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale

Python
263
3 个月前

[NeurIPS 2023] This is the code for the paper `Large Language Model as Attributed Training Data Generator: A Tale of Diversity and Bias`.

Python
153
2 年前

pyDVL is a library of stable implementations of algorithms for data valuation and influence function computation

Python
138
1 个月前

Introduction to Data-Centric AI, MIT IAP 2024 🤖

CSS
103
3 个月前

OpenDataVal: a Unified Benchmark for Data Valuation in Python (NeurIPS 2023)

Python
99
8 个月前

Self-Evolved Diverse Data Sampling for Efficient Instruction Tuning

Python
85
2 年前

nbsynthetic is simple and robust tabular synthetic data generation library for small and medium size datasets

Jupyter Notebook
68
3 年前