Repository navigation
post-training
- Website
- Wikipedia
A unified inference and post-training framework for accelerated video generation.
Awesome Reasoning LLM Tutorial/Survey/Guide
心理健康大模型 (LLM x Mental Health), Pre & Post-training & Dataset & Evaluation & Depoly & RAG, with InternLM / Qwen / Baichuan / DeepSeek / Mixtral / LLama / GLM series models
Explore the Multimodal “Aha Moment” on 2B Model
Train a Language Model with GRPO to create a schedule from a list of events and priorities
Revisiting Mid-training in the Era of Reinforcement Learning Scaling
A brief and partial summary of RLHF algorithms.
MegaScience: Pushing the Frontiers of Post-Training Datasets for Science Reasoning
A collection of vision-language-action model post-training methods.
A High-Efficiency System of Large Language Model Based Search Agents
[CVPR 2025] Science-T2I: Addressing Scientific Illusions in Image Synthesis
A comrephensive collection of learning from rewards in the post-training and test-time scaling of LLMs, with a focus on both reward models and learning strategies across training, inference, and post-inference stages.
Lightweight replication study of DeepSeek-R1-Zero. Interesting findings include "No Aha Moment", "Longer CoT ≠ Accuracy", and "Language Mixing in Instruct Models".
Flow RL. ReinFlow: Fine-tuning Flow Policy with Online RL (Reinforcement Learning).
Exploring Diffusion Transformer Designs via Grafting
[EMNLP 2022] Continual Training of Language Models for Few-Shot Learning
A novel alignment framework that leverages image retrieval to mitigate hallucinations in Vision Language Models.
Pure RL to post-train base models for social reasoning capabilities. Lightweight replication of DeepSeek-R1-Zero with Social IQa dataset.
The official implementation of Regularized Policy Gradient (RPG) (https://arxiv.org/abs/2505.17508)
RFTT: Reasoning with Reinforced Functional Token Tuning