Repository navigation
dpo
- Website
- Wikipedia
Align Anything: Training All-modality Model with Feedback
A library with extensible implementations of DPO, KTO, PPO, ORPO, and other human-aware loss functions (HALOs).
tensorflow를 사용하여 텍스트 전처리부터, Topic Models, BERT, GPT, LLM과 같은 최신 모델의 다운스트림 태스크들을 정리한 Deep Learning NLP 저장소입니다.
Official code of DeepMesh: Auto-Regressive Artist-mesh Creation with Reinforcement Learning
Implementation for "Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs"
🌾 OAT: A research-friendly framework for LLM online alignment, including preference learning, reinforcement learning, etc.
SiLLM simplifies the process of training and running Large Language Models (LLMs) on Apple Silicon by leveraging the MLX framework.
[CVPR 2025] Aesthetic Post-Training Diffusion Models from Generic Preferences with Step-by-step Preference Optimization
[ICLR 2025] IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation
Notus is a collection of fine-tuned LLMs using SFT, DPO, SFT+DPO, and/or any other RLHF techniques, while always keeping a data-first approach
CodeUltraFeedback: aligning large language models to coding preferences
[ICLR 2025] SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction
[NeurIPS 2024] Official code of $\beta$-DPO: Direct Preference Optimization with Dynamic $\beta$
[ACL 2024] Self-Training with Direct Preference Optimization Improves Chain-of-Thought Reasoning