Repository navigation

#

grpo

Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen3, Qwen3-MoE, Llama4, GLM4.5, InternLM3, DeepSeek-R1, ...) and 200+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, Llava, GLM4v, Phi4, ...) (AAAI 2025).

Python
10191
1 天前
OpenPipe/ART

Agent Reinforcement Trainer: train multi-step agents for real-world tasks using GRPO. Give your agents on-the-job training. Reinforcement learning for Qwen2.5, Qwen3, Llama, and more!

Python
7466
1 小时前

Solve Visual Understanding with Reinforced VLMs

Python
5591
1 个月前

Skywork-R1V is an advanced multimodal AI model series developed by Skywork AI (Kunlun Inc.), specializing in vision-language reasoning.

Python
2941
2 个月前
JudgmentLabs/judgeval

The open source post-building layer for agents. Our environment data and evals power agent post-training (RL, SFT) and monitoring.

Python
1012
2 小时前

verl-agent is an extension of veRL, designed for training LLM/VLM agents via RL. verl-agent is also the official code for paper "Group-in-Group Policy Optimization for LLM Agent Training"

Python
958
3 天前

MixGRPO: Unlocking Flow-based GRPO Efficiency with Mixed ODE-SDE

Python
853
2 天前

🌾 OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.

Python
482
3 天前

Collect every awesome work about r1!

Python
418
5 个月前

OpenThinkIMG is an end-to-end open-source framework that empowers LVLMs to think with images.

Jupyter Notebook
310
4 个月前

Agentic RAG R1 Framework via Reinforcement Learning

Python
302
16 天前

BioReason: Incentivizing Multimodal Biological Reasoning within a DNA-LLM Model

Jupyter Notebook
288
4 个月前

Unleashing the Power of VLMs in Autonomous Driving via Reinforcement Learning and Reasoning

Python
282
6 个月前

The open source implementation of DeepSeek-R1. 开源复现 DeepSeek-R1

Python
273
7 个月前

[NeurIPS 2025] AutoVLA: A Vision-Language-Action Model for End-to-End Autonomous Driving with Adaptive Reasoning and Reinforcement Fine-Tuning

239
16 天前

Train a Language Model with GRPO to create a schedule from a list of events and priorities

Jupyter Notebook
237
5 个月前

[NeurIPS 2025] Thinkless: LLM Learns When to Think

Python
230
9 天前

A Gaussian dense reward framework for GUI grounding training

Python
228
1 个月前

Official implementation of "SEAgent: Self-Evolving Computer Use Agent with Autonomous Learning from Experience"

Python
191
2 个月前