Repository navigation

#

grpo

Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen2.5, Llama4, InternLM3, GLM4, Mistral, Yi1.5, DeepSeek-R1, ...) and 200+ MLLMs (Qwen2.5-VL, Qwen2.5-Omni, Qwen2-Audio, Ovis2, InternVL3, Llava, MiniCPM-V-2.6, GLM4v, Xcomposer2.5, DeepSeek-VL2, Phi4, GOT-OCR2, ...).

Python
7041
3 天前

Solve Visual Understanding with Reinforced VLMs

Python
4702
2 天前

Collect every awesome work about r1!

Python
341
20 天前

🌾 OAT: A research-friendly framework for LLM online alignment, including preference learning, reinforcement learning, etc.

Python
325
2 天前

The open source implementation of DeepSeek-R1. 开源复现 DeepSeek-R1

Python
253
1 个月前

Unleashing the Power of VLMs in Autonomous Driving via Reinforcement Learning and Reasoning

Python
204
25 天前

OpenVLThinker: An Early Exploration to Vision-Language Reasoning via Iterative Self-Improvement

Python
71
24 天前

🐭 A tiny single-file implementation of Group Relative Policy Optimization (GRPO) as introduced by the DeepSeekMath paper

Python
30
2 个月前

Aims for memory-efficient training (24GB VRAM) on consumer GPUs. Optimizing language models through guidance tokens in reasoning chains, based on DeepSeekRL-Extended.

Python
28
2 个月前

Recreating the minimal training methods of DeepSeek-R1 for small langauge models.

Python
20
2 个月前
Python
19
9 天前

Simple repository for training small reasoning models

Python
12
2 个月前

Distributed Reinforcement Learning for LLM Fine-Tuning with multi-GPU utilization

Python
11
1 个月前

A travel agent based on Qwen2.5, fine-tuned by SFT + DPO/PPO/GRPO using traveling question-answer dataset, a mindmap can be output using the response. A RAG system is build upon the tuned qwen2, using Prompt-Template + Tool-Use + Chroma embedding database + LangChain

Python
9
7 天前

A reinforcement learning agent that learns to solve mazes using Group Relative Policy Optimization (GRPO).

Python
7
2 个月前

使用trl、peft、transformers等库,实现对huggingface上模型的微调。

Python
5
1 个月前

This repository contains blog post for GRPO RL algorithm using simple Grid World environment.

Jupyter Notebook
2
5 天前

LLM finetuning for Sudoku solving

Python
2
2 天前

Reinforcement Fine-Tuning

Python
1
14 天前