Repository navigation

post-training

Website
Wikipedia

hao-ai-lab / FastVideo

A unified inference and post-training framework for accelerated video generation.

diffusers diffusion-models video-generation distillation inference post-training

Python

2362

173

3 小时前

mbzuai-oryx / Awesome-LLM-Post-training

Awesome Reasoning LLM Tutorial/Survey/Guide

large-language-models post-training reasoning reinforcement-learning scaling

Python

2086

146

3 个月前

SmartFlowAI / EmoLLM

心理健康大模型 (LLM x Mental Health), Pre & Post-training & Dataset & Evaluation & Depoly & RAG, with InternLM / Qwen / Baichuan / DeepSeek / Mixtral / LLama / GLM series models

大语言模型 dataset evaluation post-training

Python

1579

198

2 个月前

turningpoint-ai / VisualThinker-R1-Zero

Explore the Multimodal “Aha Moment” on 2B Model

multimodal post-training reasoning grpo reinforcement-learning deepseek deepseek-r1 deepseek-r1-zero

Python

608

7 个月前

anakin87 / qwen-scheduler-grpo

Train a Language Model with GRPO to create a schedule from a list of events and priorities

fine-tuning grpo 大语言模型 post-training reasoning reinforcement-learning

Jupyter Notebook

237

5 个月前

GAIR-NLP / OctoThinker

Revisiting Mid-training in the Era of Reinforcement Learning Scaling

llama 大语言模型 post-training pre-training qwen reasoning rl

Jupyter Notebook

176

2 个月前

yihedeng9 / rlhf-summary-notes

A brief and partial summary of RLHF algorithms.

深度学习 large-language-models post-training reinforcement-learning rlhf

132

7 个月前

AoqunJin / Awesome-VLA-Post-Training

A collection of vision-language-action model post-training methods.

embodied-agent embodied-ai fine-tuning post-training

103

1 个月前

GAIR-NLP / MegaScience

MegaScience: Pushing the Frontiers of Post-Training Datasets for Science Reasoning

llama 大语言模型 post-training qwen reasoning science

Python

101

2 个月前

ReinFlow / ReinFlow

[NeurIPS 2025] Flow x RL. Official Implementation of "ReinFlow: Fine-tuning Flow Policy with Online Reinforcement Learning".

rl Robotics fine-tuning post-training humanoid locomotion manipulation robot-learning flow

Python

8 天前

tiannuo-yang / SearchAgent-X

A High-Efficiency System of Large Language Model Based Search Agents

agent 人工智能 approximate-nearest-neighbor-search information-retrieval 大语言模型 rag vllm llm-serving post-training rlhf

Python

3 个月前

Jialuo-Li / Science-T2I

[CVPR 2025] Science-T2I: Addressing Scientific Illusions in Image Synthesis

benchmark 机器视觉 dataset generative-model post-training science

Python

5 个月前

keshik6 / grafting

[NeurIPS 2025 Oral] Exploring Diffusion Transformer Designs via Grafting

diffusion-models diffusion-transformer image-generation text-to-image-generation linear-attention self-attention post-training

Jupyter Notebook

4 个月前

bobxwu / learning-from-rewards-llm-papers

A comrephensive collection of learning from rewards in the post-training and test-time scaling of LLMs, with a focus on both reward models and learning strategies across training, inference, and post-inference stages.

large-language-models 大语言模型 post-training reinforcement-learning

4 个月前

complex-reasoning / RPG

Official implementation of Regularized Policy Gradient (RPG) (https://arxiv.org/abs/2505.17508)

深度学习 foundation-models large-language-models 大语言模型 post-training reinforcement-learning

Python

4 天前

DolbyUUU / Logic-RL-Lite

Lightweight replication study of DeepSeek-R1-Zero. Interesting findings include "No Aha Moment", "Longer CoT ≠ Accuracy", and "Language Mixing in Instruct Models".

deepseek deepseek-r1 fine-tuning 大语言模型 post-training reinforcement-learning

Python

6 个月前