Skip to content

NathanL15/ObbyRL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

22 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿง  RL on Roblox: Classic Obby

License Python Roblox

Train a Deep Q-Network (DQN) reinforcement learning agent to navigate a Roblox Classic Obby environment. The agent learns to jump, move, and avoid hazards through trial and error, communicating in real-time between Roblox Studio and a Python server.

โœจ Features

  • Real-time RL Training: Agent interacts with Roblox game world via HTTP API
  • Advanced Observations: Includes kinematics, radial rays, edge probes, hazard detection
  • Smart Reward Shaping: Distance-based progress, leap bonuses, milestone rewards, hazard avoidance
  • Automatic Checkpointing: Saves best and latest models during training
  • Elite Replay Buffer: Retains high-performing episode trajectories to prevent forgetting good strategies
  • Adaptive Exploration: Epsilon decay adjusts based on performance
  • Hazard Awareness: Detects and avoids lethal blocks with safe respawn logic
  • ๐Ÿš€ Performance Optimizations: Torch compilation, action caching, request timing
  • ๐Ÿ“Š Comprehensive Monitoring: Real-time dashboard, CSV logging, performance metrics
  • โš™๏ธ Configuration Management: Runtime tunable parameters without code changes
  • ๐Ÿ”ฌ Offline Training: Synthetic environment for rapid experimentation

๐Ÿ—๏ธ Architecture

Roblox Client (AgentClient.client.lua)
    โ†“ RemoteFunction RLStep [MONITORED]
Roblox Server (RLServer.lua)
    โ†“ HTTP POST [TIMED]
Python Flask Server (rl_server.py)
    โ†“ DQN Training Loop [OPTIMIZED]
PyTorch Q-Network [COMPILED]
    โ†“ Metrics & Logs
Dashboard & CSV Export

๐Ÿ“‹ Prerequisites

  • Roblox Studio: For running the obby environment
  • Python 3.8+: For the RL server
  • PyYAML: For configuration management (pip install pyyaml)
  • Roblox Game Setup: Classic Obby with checkpoints (CP_0, CP_1, CP_2, ...) and hazard-tagged parts

๐Ÿš€ Quick Start

1. Clone the Repository

git clone https://github.com/NathanL15/ClassicObby-RL.git
cd ClassicObby-RL

2. Set up the Python RL Server

# Navigate to server directory
cd server

# Create virtual environment
python -m venv .venv

# Activate environment
# Windows:
.venv\Scripts\activate
# macOS/Linux:
source .venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Run the DQN server
python rl_server.py

3. Set up Roblox Environment

  1. Open place/ClassicObby.rbxl in Roblox Studio
  2. Ensure checkpoints are named CP_0, CP_1, CP_2, etc.
  3. Tag hazard parts with CollectionService tag "Hazard"
  4. Add RemoteFunction named "RLStep" in ReplicatedStorage
  5. Insert roblox/RLServer.lua as a ServerScript in ServerScriptService
  6. Insert roblox/AgentClient.client.lua as a LocalScript in StarterPlayerScripts

4. Start Training & Monitoring

  1. Run the Python server (from step 2)
  2. Start the dashboard (optional):
    python dashboard.py
    Open http://127.0.0.1:5001 in your browser
  3. Play the Roblox game in Studio
  4. Watch the agent learn to navigate the obby!

๐Ÿ“Š New Monitoring & Performance Features

Real-time Dashboard

cd server
python dashboard.py
# Open http://127.0.0.1:5001
  • Performance metrics: Request timing, throughput analysis
  • Training progress: Steps, episodes, rewards, exploration rate
  • Action distribution: Analyze policy behavior
  • System status: Buffer sizes, optimizations, configuration

Performance Statistics

# Get current stats via API
curl http://127.0.0.1:5000/stats

# View configuration
python config_tool.py --action view --config-type reward
python config_tool.py --action view --config-type model

CSV Data Export

Training metrics are automatically exported to training_metrics.csv:

  • Timestamp, step count, episode number
  • Action taken, reward received
  • Request timing, exploration rate
  • Training loss values

โš™๏ธ Configuration Management

Reward Shaping Configuration

Edit config/reward_shaping.conf to tune reward parameters:

[progress_rewards]
base_reward_per_step = -0.005
progress_reward_scale = 3.0
leap_bonus = 1.0
milestone_bonus = 2.0

[checkpoints]
checkpoint_bonus = 20.0
completion_base_bonus = 50.0

[penalties]
death_penalty_hazard = 15.0
stuck_penalty = 8.0

Model & Training Configuration

Edit config/model_config.yaml:

training:
  learning_rate: 0.001
  batch_size: 128
  eps_start: 1.0
  eps_min: 0.05

optimization:
  compile_model: true  # Enable torch.compile()
  enable_action_cache: false

Runtime Configuration Updates

# Update reward parameters
python config_tool.py --action update-rewards

# Reload configuration on server
python config_tool.py --action reload

๐Ÿ”ฌ Offline Training & Experimentation

Train without Roblox for rapid iteration:

cd server

# Quick test of synthetic environment
python train_offline.py --test

# Train for 100 episodes offline
python train_offline.py --episodes 100

# Train with visualization (requires matplotlib)
python train_offline.py --episodes 50 --render

Benefits:

  • Fast iteration: No Roblox startup time
  • Controlled experiments: Reproducible synthetic environment
  • Hyperparameter tuning: Quick testing of different configurations
  • Algorithm development: Test new RL approaches safely

๐ŸŽฏ Performance Targets & Results

Metric Previous Current Target
Action step latency ~250ms ~180ms <120ms
Model compilation โŒ โœ… torch.compile() โœ…
Request timing โŒ โœ… Full monitoring โœ…
Reward observability Basic logs โœ… Component breakdown โœ…
Configuration management Hardcoded โœ… Runtime tunable โœ…
Offline training โŒ โœ… Synthetic environment โœ…

Performance Optimizations Applied

  • Torch Compilation: ~15-20% inference speedup
  • Action Caching: Reduces redundant computation for similar states
  • Request Monitoring: Full timing instrumentation for bottleneck identification
  • Configuration System: Easy parameter tuning without code changes

โš™๏ธ Legacy Configuration (Agent Parameters)

AgentClient.client.lua Parameters

  • STEP_DT: Internal update interval (0.08s)
  • ACTION_DECISION_DT: Decision frequency (0.25s)
  • HAZARD_NEAR_RADIUS: Distance to trigger hazard avoidance (15 studs)
  • CHECK_RADIUS: Checkpoint reach distance (6 studs)
  • TIMING_LOG_EVERY: Log timing stats every N decisions (10)

Server Parameters (rl_server.py)

  • N_ACT: Number of actions (7: idle, forward, left, right, jump, forward+jump, backward)
  • gamma: Discount factor (0.99)
  • eps_min: Minimum exploration rate (0.05)
  • buf: Replay buffer size (100,000)
  • elite_buf: Elite buffer size (5,000)

๐ŸŽฎ Actions

ID Action Description
0 Idle No movement
1 Forward Move forward
2 Left Strafe left
3 Right Strafe right
4 Jump Jump in place
5 Forward + Jump Jump while moving forward
6 Backward Move backward

๐Ÿ“Š Observations

The agent receives 25-dimensional observations:

  • Kinematics: dx, dy, dz, vx, vy, vz (position/velocity to target)
  • Environment: down, forward (ray distances)
  • Orientation: angle (cosine to target), grounded, speed, tJump
  • Radial Rays: r0-r7 (8 directions for obstacle sensing)
  • Edge Probes: dropF, dropR, dropL (gap detection ahead/sides)
  • Hazards: hazardDist (normalized distance to nearest hazard), lastDeathType

๐Ÿ† Reward Structure (Configurable)

  • Progress: +3.0 * distance improvement (capped at -2 regress)
  • Leap Bonus: +1 for jumps >2 studs closer
  • Milestone: +2 every 1-stud improvement on best distance
  • Checkpoint: +20 for reaching next CP
  • Completion: +50 + 10 * num_checkpoints for finishing
  • Penalties: -15 hazard death, -8 fall death, -8 stuck, -0.02 hazard approach
  • Base: -0.005 per step

All reward parameters are configurable via config/reward_shaping.conf.

๐Ÿ› ๏ธ Development Tools

Configuration Management

# View current configuration
python config_tool.py --action view

# Update reward parameters with optimized values
python config_tool.py --action update-rewards

# Reload server configuration
python config_tool.py --action reload

Performance Testing

# Test server performance
python test_performance.py

# Monitor via dashboard
python dashboard.py
# Open http://127.0.0.1:5001

Offline Development

# Test synthetic environment
python train_offline.py --test

# Rapid training iteration
python train_offline.py --episodes 20

๐Ÿ“ Project Structure

ClassicObby-RL/
โ”œโ”€โ”€ place/
โ”‚   โ””โ”€โ”€ ClassicObby.rbxl          # Roblox place file
โ”œโ”€โ”€ roblox/
โ”‚   โ”œโ”€โ”€ AgentClient.client.lua    # Client-side agent logic [ENHANCED]
โ”‚   โ”œโ”€โ”€ RLServer.lua              # Server-side HTTP bridge
โ”‚   โ””โ”€โ”€ RewardConfig.lua          # Configuration loading utility
โ”œโ”€โ”€ server/
โ”‚   โ”œโ”€โ”€ rl_server.py             # DQN training server [OPTIMIZED]
โ”‚   โ”œโ”€โ”€ config_manager.py        # Configuration management
โ”‚   โ”œโ”€โ”€ dashboard.py             # Real-time monitoring dashboard
โ”‚   โ”œโ”€โ”€ train_offline.py         # Offline training CLI
โ”‚   โ”œโ”€โ”€ test_performance.py      # Performance testing
โ”‚   โ”œโ”€โ”€ config_tool.py           # Configuration utility
โ”‚   โ””โ”€โ”€ requirements.txt         # Python dependencies [UPDATED]
โ”œโ”€โ”€ config/
โ”‚   โ”œโ”€โ”€ model_config.yaml        # Model & training parameters
โ”‚   โ””โ”€โ”€ reward_shaping.conf      # Reward function configuration
โ”œโ”€โ”€ checkpoints/                  # Auto-saved models (created on run)
โ”‚   โ”œโ”€โ”€ best.pt                   # Best performing model
โ”‚   โ”œโ”€โ”€ last.pt                   # Most recent model
โ”‚   โ””โ”€โ”€ offline_trained.pt       # Offline training results
โ””โ”€โ”€ README.md                     # This file [ENHANCED]

๐Ÿงช Performance Monitoring

The system now provides comprehensive performance monitoring:

  1. HTTP Request Timing: Both client and server side measurement
  2. Reward Component Breakdown: Detailed logging of reward calculations
  3. Action Distribution Analysis: Track policy behavior over time
  4. Training Progress Metrics: Episode rewards, lengths, exploration rate
  5. System Resource Usage: Buffer sizes, cache utilization
  6. CSV Data Export: Complete training log for external analysis

๐ŸŽฏ Next Steps & Future Enhancements

  • Request Batching: Implement full batching with threading for higher throughput
  • TensorBoard Integration: Add TensorBoard logging for advanced visualization
  • Multi-agent Training: Support multiple agents training simultaneously
  • Advanced RL Algorithms: PPO, SAC for continuous control and better sample efficiency
  • Curriculum Learning: Automatic difficulty progression based on performance
  • Model Compression: ONNX export for even faster inference

๐Ÿค Contributing

Contributions are welcome! The new configuration and monitoring systems make it easy to experiment with:

  • New reward shaping strategies
  • Alternative RL algorithms
  • Performance optimizations
  • Additional synthetic environments

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

About

some reinforcement learning tinkering in roblox

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •