.. gomoku_rl documentation master file, created by sphinx-quickstart on Thu Feb 29 12:40:14 2024. You can adapt this file completely to your liking, but it should at least contain the root `toctree` directive. Welcome to gomoku_rl's documentation! ===================================== Introduction ------------ *gomoku_rl* is an open-sourced project that trains agents to play the game of Gomoku through deep reinforcement learning. Previous works often rely on variants of AlphaGo/AlphaZero and inefficiently use GPU resources. *gomoku_rl* features GPU-parallelized simulation and leverages recent advancements in **MARL**. Starting from random play, a model can achieve human-level performance on a :math:`15\times15` board within hours of training on a 3090. Installation ------------ Install *gomoku_rl* with the following command: .. code-block:: bash git clone git@github.com:hesic73/gomoku_rl.git cd gomoku_rl conda create -n gomoku_rl python=3.11.5 conda activate gomoku_rl pip install -e . I use python 3.11.5, torch 2.1.0 and **torchrl 0.2.1**. Lower versions of python and torch 1.x should be compatible as well. Usage ----- *gomoku_rl* uses `hydra` to configure training hyperparameters. You can modify the settings in `cfg/train_InRL.yaml` or override them via the command line: .. code-block:: bash # override default settings in cfg/train_InRL.yaml python scripts/train_InRL.py num_env=1024 device=cuda epochs=3000 wandb.mode=online # or simply: python scripts/train_InRL.py.py The default location for saving checkpoints is `wandb/*/files` or `tempfile.gettempdir()` if `wandb.mode=='disabled'`. Modify the output directory by specifying the `run_dir` parameter. After training, play Gomoku with your model using the `scripts/demo.py` script: .. code-block:: bash # Install PyQt5 pip install PyQt5 python scripts/demo.py device=cpu grid_size=56 piece_radius=24 checkpoint=/model/path # default checkpoint (only for board_size=15) python scripts/demo.py Pretrained models for a :math:`15\times15` board are available under `pretrained_models/15_15/`. Be aware that using the wrong model for the board size will lead to loading errors due to mismatches in AI architectures. In PPO, when `share_network=True`, the actor and the critic could utilize a shared encoding module. At present, a `PPO` object with a shared encoder cannot load from a checkpoint without sharing. .. toctree:: :maxdepth: 4 :caption: Contents: gomoku_rl