Repository navigation

#

post-training

A unified inference and post-training framework for accelerated video generation.

Python
2362
3 小时前

心理健康大模型 (LLM x Mental Health), Pre & Post-training & Dataset & Evaluation & Depoly & RAG, with InternLM / Qwen / Baichuan / DeepSeek / Mixtral / LLama / GLM series models

Python
1579
2 个月前

Train a Language Model with GRPO to create a schedule from a list of events and priorities

Jupyter Notebook
237
5 个月前

Revisiting Mid-training in the Era of Reinforcement Learning Scaling

Jupyter Notebook
176
2 个月前

A collection of vision-language-action model post-training methods.

103
1 个月前

MegaScience: Pushing the Frontiers of Post-Training Datasets for Science Reasoning

Python
101
2 个月前

[NeurIPS 2025] Flow x RL. Official Implementation of "ReinFlow: Fine-tuning Flow Policy with Online Reinforcement Learning".

Python
78
8 天前

[CVPR 2025] Science-T2I: Addressing Scientific Illusions in Image Synthesis

Python
61
5 个月前
Jupyter Notebook
56
4 个月前

A comrephensive collection of learning from rewards in the post-training and test-time scaling of LLMs, with a focus on both reward models and learning strategies across training, inference, and post-inference stages.

56
4 个月前

Official implementation of Regularized Policy Gradient (RPG) (https://arxiv.org/abs/2505.17508)

Python
49
4 天前

Lightweight replication study of DeepSeek-R1-Zero. Interesting findings include "No Aha Moment", "Longer CoT ≠ Accuracy", and "Language Mixing in Instruct Models".

Python
48
6 个月前

[EMNLP 2022] Continual Training of Language Models for Few-Shot Learning

Python
45
3 年前

A novel alignment framework that leverages image retrieval to mitigate hallucinations in Vision Language Models.

Python
45
1 个月前

Pure RL to post-train base models for social reasoning capabilities. Lightweight replication of DeepSeek-R1-Zero with Social IQa dataset.

Python
38
7 个月前

Official implementation for "Diffusion Instruction Tuning"

Python
30
4 个月前