A reinforcement learning framework for Magic: The Gathering, built on top of managym, using PPO as the core training algorithm.
Manabot is currently trained primarily on Ubuntu machines in aws and requires wandb credentials.
# Clone the repo
git clone git@github.com:jacklionheart/manabot.git
cd manabot
# Update python for AWS Ubuntu deep learning AMIs
ops/machine.sh
# Install managym
pip install -e managym
# Install other dependencies
pip install -e .
# Run training
python manabot/ppo/train.py --config-name simpleSimulation pulls models from wandb and so similarly requires wandb credentials, but at small scales can easily be done locally on CPU machines (though the model inference will dominate the env time).
# Assumes manabot and managym installed as above
python sim/sim.py --hero attention --villain simplepytest tests/manabot is organized into these major components:
-
manabot.env:VectorEnv: gymnasium.AsyncVectorEnv-based interface around managymObservationSpace: dataclass describing the observation spaceMatch: dataclass describing the game of magic to be played (decklists, etc.)Reward: dataclass describing the reward function
-
manabot.ppo: PPO implementation for training modelAgent: Shared Value/Policy networkTrainer: PPO trainer for learning network weights
-
manabot.sim: Simulate games of Magic: the Gathering using trained models --Player: An agent for playing Magic (either from a learned model, or random/trivial implementations) --Sim: A simulation of many games of magic of two specific players against each other -
manabot.infra: Software infrastructureExperiment: Experiment tracking with wandb/tensorboardHypers: Hydra-compatible hyperparameter management and configurationProfiler: Performance profilerlog.py: Unified logging management .data`.
-
managym vs. manabot Split
- managym (C++): Handles low-level game logic. Eventually: add native-Cpp inference for faster rollouts
- manabot (Python): Torch-backed optimization, and, for now, inference for rollouts.
-
Dynamic Discrete Action Space
- Limited number of action slots whose meaning can vary significantly from step to step.
- The “meaning” of each action slot is included in the observation tensor for context.
Files should follow this template:
"""
filename.py
One-line purpose of file
Instructions for collaborators (both human and LLM) on how to approach understanding and editing the code.
Keep this section focused and impactful.
"""
# Standard library
import os
from typing import Dict, List
# Third-party imports
from import torch import Tensor
# manabot imports
from manabot.env import ObservationSpace
# Local imports
from .sibling import Thing- Public APIs should have docstrings focused on clarifying behavior and resolving ambiguities
- Implementation details and design rationale belong in comments or READMEs
- Use organizational comments liberally:
# -----------------------------------------------------------------------------
# Organizational Header
# ------------------------------------------------------------------------------ Follow PEP8
- Use type hints as much as possible
- Use dataclasses where appropriate
When working with this codebase, LLMs should:
- Avoid comments that are transient/denote changes. Imagine the code will be directly copied into the codebase for eternity.
- Pay special attention to file headers and README content for context
- Propose small, iterative changes
- End responses with:
- Full implementations of changed files that can be copied into the codebase
- Questions that could clarify intent
- Notes on what was intentionally left out