Repository navigation

#

synthetic-dataset-generation

argilla-io/distilabel

Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.

Python
2640
5 天前

A framework for prompt tuning using Intent-based Prompt Calibration

Python
2480
9 天前

A lightweight library for generating synthetic instruction tuning datasets for your data without GPT.

Python
767
2 个月前

[ICLR 2025] Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing. Your efficient and high-quality synthetic data generation pipeline!

Python
676
1 个月前

A curated list of awesome projects which use Machine Learning to generate synthetic content.

585
2 年前

Augmentation pipeline for rendering synthetic paper printing, faxing, scanning and copy machine processes

Python
407
19 天前

[IMC 2020 (Best Paper Finalist)] Using GANs for Sharing Networked Time Series Data: Challenges, Initial Promise, and Open Questions

Python
305
1 年前

[NeurIPS D&B Track 2024] Official implementation of HumanVid

Python
287
2 个月前