Repository navigation

#

synthetic-dataset-generation

argilla-io/distilabel

Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.

Python
2897
5 天前

A framework for prompt tuning using Intent-based Prompt Calibration

Python
2795
6 个月前

A lightweight library for generating synthetic instruction tuning datasets for your data without GPT.

Python
793
3 个月前

[ICLR 2025] Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing. Your efficient and high-quality synthetic data generation pipeline!

Python
775
7 个月前

A curated list of awesome projects which use Machine Learning to generate synthetic content.

584
3 年前

Augmentation pipeline for rendering synthetic paper printing, faxing, scanning and copy machine processes

Python
464
2 个月前

[NeurIPS D&B Track 2024] Official implementation of HumanVid

Python
333
5 个月前

A novel approach for synthesizing tabular data using pretrained large language models

Python
322
3 个月前

[IMC 2020 (Best Paper Finalist)] Using GANs for Sharing Networked Time Series Data: Challenges, Initial Promise, and Open Questions

Python
307
2 年前

✨ A synthetic dataset generation framework that produces diverse coding questions and verifiable solutions - all in one framwork

Python
274
1 个月前