Train a Deep Q-Network (DQN) reinforcement learning agent to navigate a Roblox Classic Obby environment. The agent learns to jump, move, and avoid hazards through trial and error, communicating in real-time between Roblox Studio and a Python server.
- Real-time RL Training: Agent interacts with Roblox game world via HTTP API
- Advanced Observations: Includes kinematics, radial rays, edge probes, hazard detection
- Smart Reward Shaping: Distance-based progress, leap bonuses, milestone rewards, hazard avoidance
- Automatic Checkpointing: Saves best and latest models during training
- Elite Replay Buffer: Retains high-performing episode trajectories to prevent forgetting good strategies
- Adaptive Exploration: Epsilon decay adjusts based on performance
- Hazard Awareness: Detects and avoids lethal blocks with safe respawn logic
- ๐ Performance Optimizations: Torch compilation, action caching, request timing
- ๐ Comprehensive Monitoring: Real-time dashboard, CSV logging, performance metrics
- โ๏ธ Configuration Management: Runtime tunable parameters without code changes
- ๐ฌ Offline Training: Synthetic environment for rapid experimentation
Roblox Client (AgentClient.client.lua)
โ RemoteFunction RLStep [MONITORED]
Roblox Server (RLServer.lua)
โ HTTP POST [TIMED]
Python Flask Server (rl_server.py)
โ DQN Training Loop [OPTIMIZED]
PyTorch Q-Network [COMPILED]
โ Metrics & Logs
Dashboard & CSV Export
- Roblox Studio: For running the obby environment
- Python 3.8+: For the RL server
- PyYAML: For configuration management (
pip install pyyaml) - Roblox Game Setup: Classic Obby with checkpoints (CP_0, CP_1, CP_2, ...) and hazard-tagged parts
git clone https://github.com/NathanL15/ClassicObby-RL.git
cd ClassicObby-RL# Navigate to server directory
cd server
# Create virtual environment
python -m venv .venv
# Activate environment
# Windows:
.venv\Scripts\activate
# macOS/Linux:
source .venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Run the DQN server
python rl_server.py- Open
place/ClassicObby.rbxlin Roblox Studio - Ensure checkpoints are named
CP_0,CP_1,CP_2, etc. - Tag hazard parts with
CollectionServicetag "Hazard" - Add
RemoteFunctionnamed "RLStep" inReplicatedStorage - Insert
roblox/RLServer.luaas a ServerScript inServerScriptService - Insert
roblox/AgentClient.client.luaas a LocalScript inStarterPlayerScripts
- Run the Python server (from step 2)
- Start the dashboard (optional):
Open http://127.0.0.1:5001 in your browser
python dashboard.py
- Play the Roblox game in Studio
- Watch the agent learn to navigate the obby!
cd server
python dashboard.py
# Open http://127.0.0.1:5001- Performance metrics: Request timing, throughput analysis
- Training progress: Steps, episodes, rewards, exploration rate
- Action distribution: Analyze policy behavior
- System status: Buffer sizes, optimizations, configuration
# Get current stats via API
curl http://127.0.0.1:5000/stats
# View configuration
python config_tool.py --action view --config-type reward
python config_tool.py --action view --config-type modelTraining metrics are automatically exported to training_metrics.csv:
- Timestamp, step count, episode number
- Action taken, reward received
- Request timing, exploration rate
- Training loss values
Edit config/reward_shaping.conf to tune reward parameters:
[progress_rewards]
base_reward_per_step = -0.005
progress_reward_scale = 3.0
leap_bonus = 1.0
milestone_bonus = 2.0
[checkpoints]
checkpoint_bonus = 20.0
completion_base_bonus = 50.0
[penalties]
death_penalty_hazard = 15.0
stuck_penalty = 8.0Edit config/model_config.yaml:
training:
learning_rate: 0.001
batch_size: 128
eps_start: 1.0
eps_min: 0.05
optimization:
compile_model: true # Enable torch.compile()
enable_action_cache: false# Update reward parameters
python config_tool.py --action update-rewards
# Reload configuration on server
python config_tool.py --action reloadTrain without Roblox for rapid iteration:
cd server
# Quick test of synthetic environment
python train_offline.py --test
# Train for 100 episodes offline
python train_offline.py --episodes 100
# Train with visualization (requires matplotlib)
python train_offline.py --episodes 50 --renderBenefits:
- Fast iteration: No Roblox startup time
- Controlled experiments: Reproducible synthetic environment
- Hyperparameter tuning: Quick testing of different configurations
- Algorithm development: Test new RL approaches safely
| Metric | Previous | Current | Target |
|---|---|---|---|
| Action step latency | ~250ms | ~180ms | <120ms |
| Model compilation | โ | โ torch.compile() | โ |
| Request timing | โ | โ Full monitoring | โ |
| Reward observability | Basic logs | โ Component breakdown | โ |
| Configuration management | Hardcoded | โ Runtime tunable | โ |
| Offline training | โ | โ Synthetic environment | โ |
- Torch Compilation: ~15-20% inference speedup
- Action Caching: Reduces redundant computation for similar states
- Request Monitoring: Full timing instrumentation for bottleneck identification
- Configuration System: Easy parameter tuning without code changes
STEP_DT: Internal update interval (0.08s)ACTION_DECISION_DT: Decision frequency (0.25s)HAZARD_NEAR_RADIUS: Distance to trigger hazard avoidance (15 studs)CHECK_RADIUS: Checkpoint reach distance (6 studs)TIMING_LOG_EVERY: Log timing stats every N decisions (10)
N_ACT: Number of actions (7: idle, forward, left, right, jump, forward+jump, backward)gamma: Discount factor (0.99)eps_min: Minimum exploration rate (0.05)buf: Replay buffer size (100,000)elite_buf: Elite buffer size (5,000)
| ID | Action | Description |
|---|---|---|
| 0 | Idle | No movement |
| 1 | Forward | Move forward |
| 2 | Left | Strafe left |
| 3 | Right | Strafe right |
| 4 | Jump | Jump in place |
| 5 | Forward + Jump | Jump while moving forward |
| 6 | Backward | Move backward |
The agent receives 25-dimensional observations:
- Kinematics: dx, dy, dz, vx, vy, vz (position/velocity to target)
- Environment: down, forward (ray distances)
- Orientation: angle (cosine to target), grounded, speed, tJump
- Radial Rays: r0-r7 (8 directions for obstacle sensing)
- Edge Probes: dropF, dropR, dropL (gap detection ahead/sides)
- Hazards: hazardDist (normalized distance to nearest hazard), lastDeathType
- Progress: +3.0 * distance improvement (capped at -2 regress)
- Leap Bonus: +1 for jumps >2 studs closer
- Milestone: +2 every 1-stud improvement on best distance
- Checkpoint: +20 for reaching next CP
- Completion: +50 + 10 * num_checkpoints for finishing
- Penalties: -15 hazard death, -8 fall death, -8 stuck, -0.02 hazard approach
- Base: -0.005 per step
All reward parameters are configurable via config/reward_shaping.conf.
# View current configuration
python config_tool.py --action view
# Update reward parameters with optimized values
python config_tool.py --action update-rewards
# Reload server configuration
python config_tool.py --action reload# Test server performance
python test_performance.py
# Monitor via dashboard
python dashboard.py
# Open http://127.0.0.1:5001# Test synthetic environment
python train_offline.py --test
# Rapid training iteration
python train_offline.py --episodes 20ClassicObby-RL/
โโโ place/
โ โโโ ClassicObby.rbxl # Roblox place file
โโโ roblox/
โ โโโ AgentClient.client.lua # Client-side agent logic [ENHANCED]
โ โโโ RLServer.lua # Server-side HTTP bridge
โ โโโ RewardConfig.lua # Configuration loading utility
โโโ server/
โ โโโ rl_server.py # DQN training server [OPTIMIZED]
โ โโโ config_manager.py # Configuration management
โ โโโ dashboard.py # Real-time monitoring dashboard
โ โโโ train_offline.py # Offline training CLI
โ โโโ test_performance.py # Performance testing
โ โโโ config_tool.py # Configuration utility
โ โโโ requirements.txt # Python dependencies [UPDATED]
โโโ config/
โ โโโ model_config.yaml # Model & training parameters
โ โโโ reward_shaping.conf # Reward function configuration
โโโ checkpoints/ # Auto-saved models (created on run)
โ โโโ best.pt # Best performing model
โ โโโ last.pt # Most recent model
โ โโโ offline_trained.pt # Offline training results
โโโ README.md # This file [ENHANCED]
The system now provides comprehensive performance monitoring:
- HTTP Request Timing: Both client and server side measurement
- Reward Component Breakdown: Detailed logging of reward calculations
- Action Distribution Analysis: Track policy behavior over time
- Training Progress Metrics: Episode rewards, lengths, exploration rate
- System Resource Usage: Buffer sizes, cache utilization
- CSV Data Export: Complete training log for external analysis
- Request Batching: Implement full batching with threading for higher throughput
- TensorBoard Integration: Add TensorBoard logging for advanced visualization
- Multi-agent Training: Support multiple agents training simultaneously
- Advanced RL Algorithms: PPO, SAC for continuous control and better sample efficiency
- Curriculum Learning: Automatic difficulty progression based on performance
- Model Compression: ONNX export for even faster inference
Contributions are welcome! The new configuration and monitoring systems make it easy to experiment with:
- New reward shaping strategies
- Alternative RL algorithms
- Performance optimizations
- Additional synthetic environments
This project is licensed under the MIT License - see the LICENSE file for details.