Repository navigation

#

data-centric-machine-learning

Pytorch implementation of DoReMi, a method for optimizing the data mixture weights in language modeling datasets

HTML
320
1 年前

Contains implementations of data-centric approaches for improving semantic segmentation on satellite imagery.

Python
36
10 天前

A list of data-efficient and data-centric LLM (Large Language Model) papers. Our Survey Paper: Towards Efficient LLM Post Training: A Data-centric Perspective

29
2 个月前

Data-IQ: Characterizing subgroups with heterogeneous outcomes in tabular data (NeurIPS 2022)

Jupyter Notebook
16
2 年前

TRIAGE: Characterizing and auditing training data for improved regression (NeurIPS 2023)

Jupyter Notebook
11
1 年前

Data-SUITE: Data-centric identification of in-distribution incongruous examples (ICML 2022)

Jupyter Notebook
10
2 年前

You can’t handle the (dirty) truth: Data-centric insights improve pseudo-labeling

Jupyter Notebook
7
10 个月前

Code for our paper "Towards Trustworthy Dataset Distillation" (Pattern Recognition 2025)

Python
3
4 个月前

Collaboratively Learning Federated Models from Noisy Decentralized Data

Python
1
24 天前

A multi-view panorama of Data-Centric AI: Techniques, Tools, and Applications (ECAI Tutorial 2024)

1
6 个月前