This ambitious project implements a sophisticated game-playing AI system inspired by AlphaGo, combining Monte Carlo Tree Search (MCTS), adversarial search algorithms, and deep learning for the game of Go. The project includes supervised learning, value network training, and advanced search algorithms with heuristic evaluation.
- Monte Carlo Tree Search (MCTS): State-of-the-art game tree search
- Alpha-Beta Pruning: Classical minimax optimization
- Neural Networks: Deep learning for position evaluation
- Supervised Learning: Training models on expert game data
- Heuristic Search: Domain-specific evaluation functions
- PyTorch: Neural network implementation and training
- Value Networks: Board position evaluation
- Policy Learning: Move prediction
- Dataset Processing: Game state serialization (17GB+ datasets)
- Model Training: Convergence analysis and hyperparameter tuning
- Modular Architecture: Clean separation of concerns
- Object-Oriented Design: Extensible agent framework
- GUI Development: Interactive game visualization
- Testing: Comprehensive agent comparison
LLM-from-Scratch/
├── agents.py # AI agent implementations (MCTS, Minimax, etc.)
├── adversarial_search.py # Alpha-beta and minimax algorithms
├── adversarial_search_problem.py # Search problem interface
├── go_search_problem.py # Go-specific search problem
├── heuristic_adversarial_search_problem.py # Heuristic evaluation interface
├── heuristic_go_problems.py # Go-specific heuristics
├── models.py # Neural network architectures
├── supervised_learning.ipynb # Model training notebook
├── mcts_analysis.py # MCTS performance analysis
├── game_runner.py # Game orchestration
├── go_gui.py # Graphical interface
├── go_utils.py # Go game utilities
├── dataset_5x5.pkl # 5×5 Go game dataset (13MB)
├── dataset_5x5_pygo.pkl # Alternative format (17MB)
├── value_model.pt # Trained neural network
├── pygo/ # Go game engine
│ ├── game.py # Game logic
│ ├── board.py # Board representation
│ ├── group.py # Stone group management
│ ├── utils.py # Utility functions
│ └── exceptions.py # Custom exceptions
└── README.md # Comprehensive documentation
- Random Agent: Baseline performance
- Minimax Agent: Classical adversarial search
- Alpha-Beta Agent: Optimized minimax with pruning
- MCTS Agent: Monte Carlo Tree Search
- Neural Network Agent: Deep learning-based evaluation
- Hybrid Agent: MCTS + Neural networks (AlphaGo-style)
Implementation of the four phases:
- Selection: Navigate tree using UCB1 formula
- Expansion: Add new nodes to tree
- Simulation: Random playout from node
- Backpropagation: Update statistics up the tree
UCB1 Formula:
value(node) = wins/visits + C * sqrt(ln(parent_visits) / visits)- Input: Board state representation (channels for each player)
- Architecture: Convolutional neural network
- Output: Position evaluation score
- Training: Supervised learning on expert games
- Alpha-Beta Pruning: O(b^(d/2)) complexity
- Iterative Deepening: Time-bounded search
- Transposition Tables: Memoization for repeated positions
- Move Ordering: Improving pruning efficiency
Complete implementation of Go rules:
- Liberties: Counting free adjacencies
- Captures: Removing captured stones
- Ko Rule: Preventing immediate recapture
- Scoring: Territory and captured stones
- Legal Moves: Validating suicide and ko
Domain-specific evaluation considering:
- Territory control
- Stone connectivity
- Edge and corner control
- Influence and potential
- Life and death status
class MCTSNode:
def __init__(self, state, parent=None):
self.state = state
self.parent = parent
self.children = []
self.visits = 0
self.wins = 0
self.untried_actions = get_legal_actions(state)class ValueNetwork(nn.Module):
def __init__(self, board_size):
self.conv1 = nn.Conv2d(2, 64, kernel_size=3, padding=1)
self.conv2 = nn.Conv2d(64, 128, kernel_size=3, padding=1)
self.conv3 = nn.Conv2d(128, 256, kernel_size=3, padding=1)
self.fc1 = nn.Linear(256 * board_size * board_size, 256)
self.fc2 = nn.Linear(256, 1)class AdversarialSearchProblem(ABC):
@abstractmethod
def get_available_actions(self, state):
pass
@abstractmethod
def transition(self, state, action):
pass
@abstractmethod
def is_terminal_state(self, state):
pass
@abstractmethod
def evaluate_state(self, state):
pass- Language: Python 3.10
- ML Framework: PyTorch
- Data Handling: Pickle, NumPy
- Visualization: Custom GUI (Pygame/Tkinter)
- Version Control: Git
- Python Programming: Advanced OOP and functional programming
- Deep Learning: Neural network design and training
- Reinforcement Learning: MCTS and self-play
- Algorithm Design: Search algorithms and optimization
- Data Science: Large dataset processing (17GB+)
- Software Architecture: Modular, extensible design
- Game Theory: Understanding of adversarial games
- Performance Optimization: Efficient tree search
- Time: O(n × d) where n = simulations, d = depth
- Space: O(b^d) for tree storage
- Scalability: Parallelizable simulations
- Best Case: O(b^(d/2))
- Worst Case: O(b^d)
- Space: O(d) with iterative deepening
- Inference: O(L × N) where L = layers, N = neurons
- Training: O(B × E × C) where B = batch size, E = epochs, C = complexity
- Size: 17+ MB of game data
- Format: Serialized Python objects (pickle)
- Content: 5×5 Go game states and outcomes
- Purpose: Supervised learning for value network
- Games: Thousands of expert-level games
- Data Collection: Extract game states from dataset
- Preprocessing: Convert board states to tensor format
- Model Training: Train value network via supervised learning
- Validation: Test on held-out games
- Integration: Use trained model in MCTS/search agents
The project includes tools for:
- Agent head-to-head comparison
- Win rate statistics
- Move time analysis
- Search tree depth analysis
- Neural network accuracy metrics
This project demonstrates:
- Deep understanding of game AI algorithms
- Ability to implement research papers (AlphaGo)
- Machine learning engineering skills
- Complex software architecture design
- Integration of classical AI and deep learning
- Large-scale data processing
- Game Development: Creating intelligent opponents
- Decision Making: Tree search for planning
- Robotics: Motion planning and decision trees
- Finance: Trading strategy optimization
- Logistics: Route planning and optimization
- Research: Advancing game AI techniques
| Component | AlphaGo | This Project |
|---|---|---|
| Search | MCTS | ✓ Implemented |
| Value Network | Deep CNN | ✓ Implemented |
| Policy Network | Deep CNN | ○ Heuristics |
| Self-Play | Yes | ○ Dataset-based |
| GPU Training | TPUs | CPU/GPU |
| Scale | 19×19 Go | 5×5 Go |
- Go has ~250 legal moves per position
- Full game tree is intractable
- Solution: MCTS with neural network pruning
- Efficient board encoding
- Feature extraction for neural networks
- Handling symmetries
- Large dataset management (17GB+)
- Balancing game outcomes
- Preventing overfitting
- Combining MCTS with neural networks
- Balancing exploration vs. exploitation
- Tuning hyperparameters
- Policy Network: Add move prediction network
- Self-Play: Implement reinforcement learning
- Larger Boards: Scale to 9×9 or 19×19
- Parallel MCTS: Multi-threaded tree search
- Opening Book: Database of strong openings
- GPU Acceleration: CUDA kernels for search
- Hybrid Architecture: Combining classical search with deep learning
- Modular Design: Easy to swap agents and algorithms
- Comprehensive Evaluation: Multiple heuristics and learned evaluation
- Educational Value: Well-documented implementation of complex AI
- Silver et al., "Mastering the game of Go with deep neural networks and tree search" (2016)
- Silver et al., "Mastering the game of Go without human knowledge" (2017)
- Coulom, "Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search" (2006)
- MCTS Agent: ~1000 simulations per move
- Neural Network: ~100ms inference time
- Alpha-Beta: Depth 4-6 search in reasonable time
- Win Rate: Hybrid agent > MCTS > Alpha-Beta > Random