Repository navigation

preference-alignment

Website
Wikipedia

princeton-nlp / SimPO

[NeurIPS 2024] SimPO: Simple Preference Optimization with a Reference-Free Reward

alignment large-language-models preference-alignment rlhf

Python

923

8 个月前

zjukg / KnowPAT

[Paper][ACL 2024 Findings] Knowledgeable Preference Alignment for LLMs in Domain-specific Question Answering

knowledge-graph large-language-models question-answering preference-alignment instruction-tuning

Python

195

1 年前

Meaquadddd / DPO-Shift

DPO-Shift: Shifting the Distribution of Direct Preference Optimization

alignment large-language-models preference-alignment rlhf

Python

7 个月前

Video-Bench / Video-Bench

Video Generation Benchmark

large-language-models multimodal-large-language-models preference-alignment sora video-generation video-understanding 大语言模型 text-to-video

Python

4 个月前

junkangwu / beta-DPO

[NeurIPS 2024] Official code of $\beta$-DPO: Direct Preference Optimization with Dynamic $\beta$

alignment dpo preference-alignment rlhf

Python

1 年前

GradientSpaces / respace

Code for "ReSpace: Text-Driven 3D Indoor Scene Synthesis and Editing with Preference Alignment"

large-language-models preference-alignment

Python

10 天前

Shentao-YANG / Dense_Reward_T2I

Source code for "A Dense Reward View on Aligning Text-to-Image Diffusion with Preference" (ICML'24).

preference-alignment text-to-image-generation

Python

1 年前

junkangwu / Dr_DPO

[ICLR 2025] Official code of "Towards Robust Alignment of Language Models: Distributionally Robustifying Direct Preference Optimization"

alignment dpo preference-alignment rlhf

Python

1 年前

YJiangcm / BMC

[ICLR 2025] Bridging and Modeling Correlations in Pairwise Data for Direct Preference Optimization

alignment dpo rlhf 大语言模型 preference-alignment

Python

8 个月前

MingjunPan / PO4COPs

[ICML 25] "Preference Optimization for Combinatorial Optimization Problems"

combinatorial-optimization preference-alignment reinforcement-learning

Python

4 个月前

pspdada / SENTINEL

[ICCV 2025] Official repository of "Mitigating Object Hallucinations via Sentence-Level Early Intervention".

multimodal-datasets multimodal-large-language-models preference-alignment image-captioning

Python

2 个月前

dvlab-research / TGDPO

[ICML 2025] TGDPO: Harnessing Token-Level Reward Guidance for Enhancing Direct Preference Optimization

alignment large-language-models 大语言模型 preference-alignment rlhf

Python

3 个月前

BARUDA-AI / Awesome-Preference-Optimization

Survey of preference alignment algorithms

alignment preference-alignment rlhf

2 年前

thibaud-perrin / synthetic-datasets

Generate synthetic datasets for instruction tuning and preference alignment using tools like `distilabel` for efficient and scalable data creation.

人工智能 instruction-tuning 大语言模型 preference-alignment synthetic-data

Jupyter Notebook

8 个月前

reshalfahsi / gpt2chat

Creating a GPT-2-Based Chatbot with Human Preferences

聊天机器人 gpt-2 huggingface instruction-tuning langchain preference-alignment PyTorch pytorch-lightning language-model 自然语言处理

Jupyter Notebook

5 个月前