Repository navigation

dpo

Website
Wikipedia

oumi-ai / oumi

Easily fine-tune, evaluate and deploy gpt-oss, Qwen3, DeepSeek-R1, or any open source LLM / VLM!

dpo evaluation fine-tuning inference llama 大语言模型 sft vlms gpt-oss

Python

8502

641

5 天前

PKU-Alignment / align-anything

Align Anything: Training All-modality Model with Feedback

large-language-models multimodal rlhf chameleon dpo vision-language-model

Jupyter Notebook

4555

504

1 个月前

shibing624 / MedicalGPT

MedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training Pipeline. 训练医疗大模型，实现了包括增量预训练(PT)、有监督微调(SFT)、RLHF、DPO、ORPO、GRPO。

llama ChatGPT gpt 大语言模型 medical dpo

Python

4108

603

1 个月前

ContextualAI / HALOs

A library with extensible implementations of DPO, KTO, PPO, ORPO, and other human-aware loss functions (HALOs).

alignment dpo ppo rlhf

Python

887

5 天前

zhaorw02 / DeepMesh

[ICCV 2025] Official code of DeepMesh: Auto-Regressive Artist-mesh Creation with Reinforcement Learning

3D aigc dpo generative-model 大语言模型 mesh mesh-generation Point cloud

Python

647

2 个月前

jianzhnie / LLamaTuner

Easy and Efficient Finetuning LLMs. (Supported LLama, LLama2, LLama3, Qwen, Baichuan, GLM , Falcon) 大模型高效量化训练+部署.

llama ChatGPT dpo llama3 mixtral ppo qlora qwen rlhf

Python

614

8 个月前

ukairia777 / tensorflow-nlp-tutorial

tensorflow를 사용하여 텍스트 전처리부터, Topic Models, BERT, GPT, LLM과 같은 최신 모델의 다운스트림 태스크들을 정리한 Deep Learning NLP 저장소입니다.

Tensorflow 自然语言处理 question-answering named-entity-recognition bert-ner bert 大语言模型 dpo llama sft huggingface transformers lora trainer

Jupyter Notebook

559

290

3 个月前

sail-sg / oat

🌾 OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.

alignment dpo 大语言模型 rlhf distributed-training reasoning grpo ppo

Python

482

3 天前

dvlab-research / Step-DPO

Implementation for "Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs"

dpo 大语言模型数学 reasoning

Python

383

9 个月前

TUDB-Labs / mLoRA

An Efficient "Factory" to Build Multiple LoRA Adapters

baichuan chatglm finetune llama llama2 大语言模型 lora peft gpu dpo rlhf

Python

345

8 个月前

armbues / SiLLM

SiLLM simplifies the process of training and running Large Language Models (LLMs) on Apple Silicon by leveraging the MLX framework.

apple-silicon dpo large-language-models 大语言模型 llm-inference llm-training lora MLX

Python

278

4 个月前

RockeyCoss / SPO

[CVPR 2025] Aesthetic Post-Training Diffusion Models from Generic Preferences with Step-by-step Preference Optimization

diffusion-models dpo sdxl text-to-image text-to-image-generation

Python

252

6 个月前

YangLing0818 / IterComp

[ICLR 2025] IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation

dpo rlhf text-to-image

Python

197

8 个月前

TideDra / VL-RLHF

A RLHF Infrastructure for Vision-Language Models

dpo 大语言模型 lmm mllm rlhf vlm

Python

184

1 年前

Goekdeniz-Guelmez / mlx-lm-lora

Train Large Language Models on MLX.

Apple 深度学习 dpo 机器学习 training finetuning-llms rlhf supervised-machine-learning

Python

182

7 天前

argilla-io / notus

Notus is a collection of fine-tuned LLMs using SFT, DPO, SFT+DPO, and/or any other RLHF techniques, while always keeping a data-first approach

dpo fine-tuning Zephyr RTOS

Python

169

2 年前

wendell0218 / Awesome-RL-for-Video-Generation

A curated list of papers on reinforcement learning for video generation

dpo grpo ppo reinforcement-learning video-generation

157

2 天前

anilca / NetTrader.Indicator

Technical anaysis library for .NET

bollinger-bands cmf dpo macd momentum pvt sar

142

1 年前

codelion / pts

Pivotal Token Search

dataset-generation dpo 大语言模型 llm-inference phi4 tokens

Python

126

3 个月前

AIDC-AI / CHATS

CHATS: Combining Human-Aligned Optimization and Test-Time Sampling for Text-to-Image Generation (ICML2025)

dpo sdxl text-to-image

Python

116

2 个月前