Repository navigation

#

data-centric-machine-learning

Pytorch implementation of DoReMi, a method for optimizing the data mixture weights in language modeling datasets

HTML
340
2 年前

Contains implementations of data-centric approaches for improving semantic segmentation on satellite imagery.

Python
42
4 个月前

A list of data-efficient and data-centric LLM (Large Language Model) papers. Our Survey Paper: Towards Efficient LLM Post Training: A Data-centric Perspective

35
6 个月前

Data-IQ: Characterizing subgroups with heterogeneous outcomes in tabular data (NeurIPS 2022)

Jupyter Notebook
16
2 年前

(Pattern Recognition 2025) Towards Trustworthy Dataset Distillation

Python
13
8 个月前

TRIAGE: Characterizing and auditing training data for improved regression (NeurIPS 2023)

Jupyter Notebook
11
1 年前

Data-SUITE: Data-centric identification of in-distribution incongruous examples (ICML 2022)

Jupyter Notebook
9
2 年前

You can’t handle the (dirty) truth: Data-centric insights improve pseudo-labeling

Jupyter Notebook
7
1 年前

Collaboratively Learning Federated Models from Noisy Decentralized Data

Python
1
5 个月前

A multi-view panorama of Data-Centric AI: Techniques, Tools, and Applications (ECAI Tutorial 2024)

1
10 个月前