Repository navigation

#

self-rewarding

Implementation of the training framework proposed in Self-Rewarding Language Model, from MetaAI

Python
1398
1 年前

SQL-o1: A Self-Reward Heuristic Dynamic Search Method for Text-to-SQL

Python
190
3 个月前

Consistent Paths Lead to Truth: Self-Rewarding Reinforcement Learning for LLM Reasoning

Python
15
2 个月前

SeRL: Self-Play Reinforcement Learning for Large Language Models with Limited Data

Python
14
6 小时前