Repository navigation

#

proximal-policy-optimization

High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG)

Python
6822
11 天前

An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & RingAttention & RFT)

Python
6328
14 小时前

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).

Python
3732
3 年前

PyTorch implementation of Deep Reinforcement Learning: Policy Gradient methods (TRPO, PPO, A2C) and Generative Adversarial Imitation Learning (GAIL). Fast Fisher vector product TRPO.

Python
1188
4 年前

This repository contains most of pytorch implementation based classic deep reinforcement learning algorithms, including - DQN, DDQN, Dueling Network, DDPG, SAC, A2C, PPO, TRPO. (More algorithms are still in progress)

Python
678
4 年前

Trading Environment(OpenAI Gym) + PPO(TensorForce)

Python
248
2 年前
Python
198
3 年前

PyTorch implementation of some reinforcement learning algorithms: A2C, PPO, Behavioral Cloning from Observation (BCO), GAIL.

Python
145
3 年前

Proximal Policy Optimization(PPO) with Intrinsic Curiosity Module(ICM)

Python
139
6 年前