Repository navigation
grpo
- Website
- Wikipedia
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen3, Qwen3-MoE, Llama4, GLM4.5, InternLM3, DeepSeek-R1, ...) and 200+ MLLMs (Qwen2.5-VL, Qwen2.5-Omni, Qwen2-Audio, InternVL3, Ovis2.5, Llava, GLM4v, Phi4, ...) (AAAI 2025).
Agent Reinforcement Trainer: train multi-step agents for real-world tasks using GRPO. Give your agents on-the-job training. Reinforcement learning for Qwen2.5, Qwen3, Llama, and more!
Solve Visual Understanding with Reinforced VLMs
Skywork-R1V is an advanced multimodal AI model series developed by Skywork AI (Kunlun Inc.), specializing in vision-language reasoning.
The open source post-building layer for agents. Our environment data and evals power agent post-training (RL, SFT) and monitoring.
verl-agent is an extension of veRL, designed for training LLM/VLM agents via RL. verl-agent is also the official code for paper "Group-in-Group Policy Optimization for LLM Agent Training"
Explore the Multimodal “Aha Moment” on 2B Model
🌾 OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.
OpenThinkIMG is an end-to-end open-source framework that empowers LVLMs to think with images.
BioReason: Incentivizing Multimodal Biological Reasoning within a DNA-LLM Model
The open source implementation of DeepSeek-R1. 开源复现 DeepSeek-R1
Unleashing the Power of VLMs in Autonomous Driving via Reinforcement Learning and Reasoning
MixGRPO: Unlocking Flow-based GRPO Efficiency with Mixed ODE-SDE
Train a Language Model with GRPO to create a schedule from a list of events and priorities
[Preprint 2025] Thinkless: LLM Learns When to Think
Official implementation of paper "AutoVLA: A Vision-Language-Action Model for End-to-End Autonomous Driving with Adaptive Reasoning and Reinforcement Fine-Tuning"
Official implementation of GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI Agents