Repository navigation

#

synthetic-dataset-generation

argilla-io/distilabel

Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.

Python
2850
1 天前

A framework for prompt tuning using Intent-based Prompt Calibration

Python
2742
4 个月前

A lightweight library for generating synthetic instruction tuning datasets for your data without GPT.

Python
787
1 个月前

[ICLR 2025] Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing. Your efficient and high-quality synthetic data generation pipeline!

Python
753
5 个月前

A curated list of awesome projects which use Machine Learning to generate synthetic content.

584
2 年前

Augmentation pipeline for rendering synthetic paper printing, faxing, scanning and copy machine processes

Python
451
1 个月前

[NeurIPS D&B Track 2024] Official implementation of HumanVid

Python
328
3 个月前

A novel approach for synthesizing tabular data using pretrained large language models

Python
321
2 个月前

[IMC 2020 (Best Paper Finalist)] Using GANs for Sharing Networked Time Series Data: Challenges, Initial Promise, and Open Questions

Python
306
2 年前