Repository navigation

#

data-centric-ai

Resources for Data Centric AI

TeX
1109
1 年前

Lab assignments for Introduction to Data-Centric AI, MIT IAP 2024 👩🏽‍💻

Jupyter Notebook
450
2 个月前

Official PyTorch implementation of the paper "Dataset Distillation with Neural Characteristic Function: A Minmax Perspective" (NCFM) in CVPR 2025 (Highlight).

Python
329
8 天前

Offical Repo for "Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale"

Python
236
1 天前

[NeurIPS 2023] This is the code for the paper `Large Language Model as Attributed Training Data Generator: A Tale of Diversity and Bias`.

Python
150
1 年前

pyDVL is a library of stable implementations of algorithms for data valuation and influence function computation

Python
124
21 小时前

Introduction to Data-Centric AI, MIT IAP 2023 🤖

CSS
98
2 个月前

OpenDataVal: a Unified Benchmark for Data Valuation in Python (NeurIPS 2023)

Python
95
2 个月前

Self-Evolved Diverse Data Sampling for Efficient Instruction Tuning

Python
79
1 年前

Papers about training data quality management for ML models.

65
2 个月前

nbsynthetic is simple and robust tabular synthetic data generation library for small and medium size datasets

Jupyter Notebook
65
2 年前