Repository navigation

#

post-training

A unified inference and post-training framework for accelerated video generation.

Python
2021
4 小时前

心理健康大模型 (LLM x Mental Health), Pre & Post-training & Dataset & Evaluation & Depoly & RAG, with InternLM / Qwen / Baichuan / DeepSeek / Mixtral / LLama / GLM series models

Python
1537
3 个月前

Train a Language Model with GRPO to create a schedule from a list of events and priorities

Jupyter Notebook
221
4 个月前

Revisiting Mid-training in the Era of Reinforcement Learning Scaling

Jupyter Notebook
164
1 个月前

MegaScience: Pushing the Frontiers of Post-Training Datasets for Science Reasoning

Python
88
1 个月前

A collection of vision-language-action model post-training methods.

88
5 天前

[CVPR 2025] Science-T2I: Addressing Scientific Illusions in Image Synthesis

Python
59
4 个月前

A comrephensive collection of learning from rewards in the post-training and test-time scaling of LLMs, with a focus on both reward models and learning strategies across training, inference, and post-inference stages.

53
2 个月前

Lightweight replication study of DeepSeek-R1-Zero. Interesting findings include "No Aha Moment", "Longer CoT ≠ Accuracy", and "Language Mixing in Instruct Models".

Python
51
5 个月前

Flow RL. ReinFlow: Fine-tuning Flow Policy with Online RL (Reinforcement Learning).

Python
51
15 小时前

[EMNLP 2022] Continual Training of Language Models for Few-Shot Learning

Python
45
3 年前

A novel alignment framework that leverages image retrieval to mitigate hallucinations in Vision Language Models.

Python
44
4 个月前

Pure RL to post-train base models for social reasoning capabilities. Lightweight replication of DeepSeek-R1-Zero with Social IQa dataset.

Python
38
5 个月前

The official implementation of Regularized Policy Gradient (RPG) (https://arxiv.org/abs/2505.17508)

Python
35
8 天前

RFTT: Reasoning with Reinforced Functional Token Tuning

Python
29
2 个月前