What is this repo?

This repository contains reinforcement learning implementations for creating agents to control a robot in the Harvard PacBot competition.

I implemented three RL algorithms from scratch: DQN, PPO, and AlphaZero. The main training/evaluation scripts are in src/algorithms/.

The rest of the src/ directory contains supporting Python code.

The game environment, as well as some other performance-sentitive code such as the vectorized Monte Carlo tree search implementation for AlphaZero, are implemented in Rust as a Python extension module (using pyo3 and maturin). This code is located in pacbot_rs/.

Usage

Make sure you have Poetry and Rust installed.

DQN has generally worked better than PPO and AlphaZero for us, so the main script you should run is src/algorithms/train_dqn.py.

Setup

Make sure you have the poetry environment activated (by running poetry shell within the repo; make sure you have also installed dependencies with poetry install).

Initially, and after making changes to any of the Rust code in pacbot_rs/, you'll need to build the Rust extension module and install it into the Python environment:

pacbot_rs/build_and_install.sh

Training

First, make sure your working directory is src/.

To start a training run, invoke train_dqn.py:

python3 -m algorithms.train_dqn

To fine-tune (start from an existing checkpoint instead of initializing the model randomly), pass --finetune path/to/checkpoint.
By default, model checkpoints will be saved to ./checkpoints/. You can change this by passing --checkpoint-dir path/to/dir.
By default, this will use Weights & Biases to log checkpoints and various metrics to aid debugging and tracking training progress. To disable this, pass --no-wandb.
By default, the cuda device is used (if available), falling back to cpu. To manually set the PyTorch device, pass --device <device>.

You can also change hyperparameters with command-line arguments. To see the full list, pass --help or take a look at the hyperparam_defaults dictionary near the top of train_dqn.py. In particular:

If you're running out of memory, consider reducing the batch_size.
num_iters controls the total runtime of the script execution, and also indirectly the value of $\epsilon$ (epsilon), which controls the amount of random exploration ($\epsilon$ starts at initial_epsilon and linearly decreases to final_epsilon after num_iters). Of course, you can always set num_iters to a large value (like the default) and just kill the script once it reaches a satisfactory level of performance.

Visualizing a trained agent

To show an animated visualization in the terminal of the agent being controlled by a particular checkpoint, use --eval:

python3 -m algorithms.train_dqn --eval path/to/checkpoint.pt

Name		Name	Last commit message	Last commit date
Latest commit History 95 Commits
computed_data		computed_data
pacbot_rs		pacbot_rs
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
rustfmt.toml		rustfmt.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

What is this repo?

Usage

Setup

Training

Visualizing a trained agent

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

qxzcode/pacbot-rl

Folders and files

Latest commit

History

Repository files navigation

What is this repo?

Usage

Setup

Training

Visualizing a trained agent

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages