This project implements an adaptive load balancing system designed to optimize workload distribution across a multi-server environment through simulation-based traffic scenarios.
The load balancing strategy is learned using Reinforcement Learning, where the problem is modeled as a Markov Decision Process (MDP) to adapt routing decisions based on observed system states and workload patterns.
The core of the project is the interaction between a central RL Agent and a simulated cluster environment developed using the Gymnasium library.
The project evaluates two primary neural network-based RL architectures:
-
Standard DQN
Approximates the Q-value function to handle the continuous state space of server loads. -
Dueling DQN
Decouples the State Value$V(s)$ from the Action Advantage$A(s,a)$ , allowing the agent to identify high-risk states regardless of the specific routing decision.
The Standard DQN and Dueling DQN were compared head-to-head under identical conditions following hyperparameter optimization to verify architectural superiority.
Analysis: When system demand matches processing capacity, both agents converge to a stable operating regime with nearly identical performance.
Analysis: The Standard DQN exhibits noticeable instability due to overestimation bias. In contrast, the Dueling DQN maintains a significantly more stable and robust response despite persistent overload.
-
src/environment.py
A custom Gymnasium environment that simulates a 3-server cluster, managing state transitions based on server processing rates and traffic modes (Low/High). -
src/agents.py
Implementation of the Reinforcement Learning agents, including the Standard DQN and Dueling DQN neural network architectures, as well as baseline heuristics such as Round Robin and Least Connections. -
main.py
The primary script for training the Dueling DQN agent, handling the training loop, model saving, and generating reward history plots. -
tune.py
A high-performance multiprocessing script used to parallelize a grid search over learning rates and discount factors to identify optimal hyperparameters. -
compare.py
A specialized script for performing head-to-head performance comparisons between Standard and Dueling architectures under identical high-traffic conditions. -
ablation.py
A diagnostic script that performs an ablation study by systematically disabling core components like the Target Network or Replay Memory to quantify their impact on training stability. -
test.py
A comprehensive stress test script that evaluates trained agents against traditional baselines using metrics like average load, load standard deviation (fairness), and P99 load. -
visualize.py
A simulation utility that produces real-time load distribution GIFs and step-by-step visualizations of server CPU utilization. -
benchmark.py
A validation tool that calculates Euclidean distance and similarity percentages to compare simulation telemetry against Mendeley Data industrial benchmark traces.
A parallelized grid search was conducted using multiprocessing to identify the most stable RL parameters.
The results identified
| Rank | Learning Rate | Gamma | Architecture | Average Reward |
|---|---|---|---|---|
| 1 | 0.001 | 0.99 | Dueling DQN | -62.98 |
| 2 | 0.001 | 0.95 | Dueling DQN | -66.02 |
| 3 | 0.0005 | 0.99 | Dueling DQN | -70.18 |
| 4 | 0.001 | 0.99 | Standard DQN | -70.27 |
| 5 | 0.001 | 0.90 | Dueling DQN | -73.71 |
| 6 | 0.0005 | 0.95 | Standard DQN | -74.21 |
| 7 | 0.0001 | 0.99 | Standard DQN | -74.31 |
| 8 | 0.0005 | 0.90 | Standard DQN | -75.92 |
| 9 | 0.001 | 0.90 | Standard DQN | -76.34 |
| 10 | 0.0001 | 0.99 | Dueling DQN | -76.60 |
| 11 | 0.0005 | 0.90 | Dueling DQN | -77.11 |
| 12 | 0.0001 | 0.95 | Standard DQN | -78.42 |
| 13 | 0.0005 | 0.95 | Dueling DQN | -78.54 |
| 14 | 0.0001 | 0.90 | Dueling DQN | -78.60 |
| 15 | 0.0005 | 0.99 | Standard DQN | -79.78 |
| 16 | 0.0001 | 0.90 | Standard DQN | -81.26 |
| 17 | 0.001 | 0.95 | Standard DQN | -82.22 |
| 18 | 0.0001 | 0.95 | Dueling DQN | -86.12 |
The trained RL policy was compared against industry-standard heuristics: Least Connections and Round Robin.
Analysis: Under high traffic, the RL agent maintains superior fairness (0.237 Std Dev) and minimizes P99 latency compared to static baselines.
To ensure the simulation's realism, the server load vectors generated by the RL agent were compared against Mendeley Data workload traces.
| Metric | Similarity (%) |
|---|---|
| Mean (Average) | 90.21 |
| Standard Deviation | 5.42 |
| Min. Similarity | 76.60 |
| Max. Similarity | 98.45 |
Result: The agent's learned policy achieved a 90.21% mean similarity with real-world server states.
The following visualizations illustrate the agent's routing behavior at the system level during the testing phase.
These were generated using ImageIO to capture real-time load distributions.






