Repository navigation

#

self-play

A clean implementation based on AlphaZero for any game in any framework + tutorial + Othello/Gobang/TicTacToe/Connect4 and more

Jupyter Notebook
4219
8 个月前

[NeurIPS 2023 Spotlight] LightZero: A Unified Benchmark for Monte Carlo Tree Search in General Sequential Decision Scenarios (awesome MCTS)

Python
1422
20 小时前

An artificial intelligence platform for the StarCraft II with large-scale distributed training and grand-master agents.

Python
1285
5 个月前

The official implementation of Self-Play Fine-Tuning (SPIN)

Python
1192
1 年前

The official implementation of Self-Play Preference Optimization (SPPO)

Python
576
7 个月前

A Massively Parallel Large Scale Self-Play Framework

Python
350
3 年前

A custom MARL (multi-agent reinforcement learning) environment where multiple agents trade against one another (self-play) in a zero-sum continuous double auction. Ray [RLlib] is used for training.

Jupyter Notebook
149
3 天前

SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning

Python
136
14 天前

AlphaZero implementation for Othello, Connect-Four and Tic-Tac-Toe based on "Mastering the game of Go without human knowledge" and "Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm" by DeepMind.

Python
90
7 年前

The exact codes used by the team "liveinparis" at the kaggle football competition ranked 6th/1141

Python
57
5 年前

A very fast implementation of AlphaZero, applied to games like Splendor, Santorini, The Little Prince, … Browser version available

Python
52
2 天前

A Self Play reinforcement learning Agent learns to play TicTacToe using the ML-Agents Framework in Unity.

C#
37
3 年前

AI agents for the bavarian card game Schafkopf trained with reinforcement learning

Python
37
15 天前

This is the implementation of paper Model Free Episodic Control

Python
36
6 年前

Using self-play, MCTS, and a deep neural network to create a hearthstone ai player

Python
29
7 年前

A UCI-compatible four-player chess engine powered by deep RL and 256-bit bitboards.

C++
28
1 个月前