A comprehensive open-source framework for simulating and optimizing data center cooling systems, combining AI workload simulation with thermal modeling and control strategies.
This repository provides a complete simulation environment for data center cooling optimization, featuring:
- AI Datacenter Simulation Engine: Large-scale AI training workload simulation with GPU-aware thermal analysis
- AlphaDataCenterCooling: Virtual testbed for evaluating data center cooling control strategies
- Telemetry & Monitoring: Prometheus and Grafana integration for real-time metrics
- Dashboard: Web-based visualization and control interface
.
├── ai-datacenter-sim/ # AI workload simulation and telemetry stack
│ ├── SimAI/ # SimAI large-scale training simulator
│ ├── simulation/ # Facility simulation (RDHx + CHW plant dynamics)
│ ├── adapters/ # Integration adapters for cooling systems
│ │ ├── alpha_adapter/ # AlphaDataCenterCooling adapter
│ │ └── simai_adapter/ # SimAI workload adapter
│ ├── monitoring/ # Prometheus and Grafana configuration
│ ├── dashboard/ # Frontend dashboard and API
│ └── telemetry/ # Telemetry ingestion utilities
│
├── AlphaDataCenterCooling/ # Virtual testbed for cooling system optimization
│ ├── AlphaDataCenterCooling_Gym/ # Gymnasium environment interface
│ ├── Resources/ # Model files and initialization data
│ └── docs/ # Documentation and figures
│
└── [Documentation files] # Project documentation and guides
- Docker and Docker Compose
- Python 3.8+ (for local development)
- Git
-
Clone the repository:
git clone <repository-url> cd datacenter-cooling-sim
-
Configure environment variables:
# Copy .env.example to ai-datacenter-sim directory cp .env.example ai-datacenter-sim/.env # Edit ai-datacenter-sim/.env and change Grafana admin credentials for production
-
Start the AI Datacenter Simulation:
cd ai-datacenter-sim docker-compose up -dNote: Docker Compose automatically reads the
.envfile in the same directory. The Grafana credentials (GRAFANA_ADMIN_USERandGRAFANA_ADMIN_PASSWORD) are loaded from this file.This starts:
- Prometheus (metrics collection) on port 9090
- Grafana (visualization) on port 3000
- AlphaDataCenterCooling service on port 5001
- Alpha adapter on port 8085
- SimAI adapter
- Dashboard API on port 8001
- Dashboard frontend on port 5174
-
Access the services:
- Grafana Dashboard: http://localhost:3000 (default: admin/admin)
- Prometheus: http://localhost:9090
- Dashboard: http://localhost:5174
- AlphaDataCenterCooling API: http://localhost:5001
cd AlphaDataCenterCooling
docker-compose upSee AlphaDataCenterCooling/README.md for detailed usage instructions.
- AI Datacenter Sim README: Detailed guide for the simulation engine
- Telemetry Documentation: Telemetry setup and usage
- AlphaDataCenterCooling README: Cooling system testbed documentation
- SimAI integration for large-scale AI training workload simulation
- GPU-aware thermal modeling
- Network topology simulation
- Workload trace analysis
- AlphaDataCenterCooling virtual testbed
- Gymnasium-compatible environment for RL/control algorithms
- REST API for external integration
- Real-time disturbance updates
- Prometheus metrics collection
- Grafana dashboards
- Custom web dashboard
- Real-time telemetry ingestion
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ SimAI │────▶│ SimAI Adapter │────▶│ Pushgateway │
│ Workloads │ │ │ │ │
└─────────────────┘ └──────────────────┘ └────────┬────────┘
│
┌─────────────────┐ ┌──────────────────┐ │
│ AlphaDataCenter │────▶│ Alpha Adapter │──────────────┼─────┐
│ Cooling │ │ │ │ │
└─────────────────┘ └──────────────────┘ │ │
▼ ▼
┌─────────────────┐
│ Prometheus │
│ (Metrics DB) │
└────────┬────────┘
│
┌──────────────────────┼──────────────────────┐
▼ ▼ ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Grafana │ │ Dashboard API│ │ Dashboard │
│ │ │ │ │ Frontend │
└──────────────┘ └──────────────┘ └──────────────┘
Each component can be built independently:
# Build Alpha adapter
cd ai-datacenter-sim/adapters/alpha_adapter
docker build -t alpha-adapter .
# Build SimAI adapter
cd ai-datacenter-sim/adapters/simai_adapter
docker build -t simai-adapter .
# Build dashboard
cd ai-datacenter-sim/dashboard/frontend
npm install
npm run buildSee individual component READMEs for testing instructions.
We welcome contributions! Please see CONTRIBUTING.md for guidelines.
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
Third-party components have their own licenses:
- SimAI: Apache License 2.0
- Astra-Sim (Alibaba Cloud fork): MIT License
- AlphaDataCenterCooling: See
AlphaDataCenterCooling/README.mdfor citation information
If you use this framework in your research, please cite:
@misc{datacenter-cooling-sim2025,
title={Data Center Cooling Simulation Framework},
author={Kardashev Labs},
year={2025},
url={https://github.com/kardashev-lab/datacenter-cooling-sim}
}This framework builds upon the following open-source projects and research:
SimAI - Large-scale AI training simulation framework:
- Repository: https://github.com/aliyun/SimAI
- Paper: SimAI: Unifying Architecture Design and Performance Tuning for Large-Scale Large Language Model Training with Scalability and Precision (NSDI'25 Spring)
- License: Apache License 2.0
- Based on: Astra-Sim (https://github.com/astra-sim/astra-sim) extended by Alibaba Cloud
AlphaDataCenterCooling - Virtual testbed for data center cooling optimization:
- Repository: https://github.com/wfzheng/AlphaDataCenterCooling
- Citation:
@article{wu2025alphadatacentercooling,
title={AlphaDataCenterCooling: A virtual testbed for evaluating operational strategies in data center cooling plants},
author={Wu, S. and Zheng, W. and Wang, Z. and Chen, G. and Yang, P. and Yue, S. and Li, D. and Wu, Y.},
journal={Applied Energy},
volume={380},
pages={125100},
year={2025}
}For questions and issues:
- Open an issue on GitHub
- Check the documentation in each component's README
- Review the telemetry documentation for setup help
- SimAI Team (Alibaba Cloud) for the large-scale AI training simulation framework
- Repository: https://github.com/aliyun/SimAI
- Paper: NSDI'25 Spring
- Astra-Sim Team (Georgia Tech & Facebook) for the original simulation framework
- Repository: https://github.com/astra-sim/astra-sim
- AlphaDataCenterCooling Authors for the cooling system testbed
- Repository: https://github.com/wfzheng/AlphaDataCenterCooling
- Paper: Applied Energy, 380, 125100 (2025)
- The open-source community for excellent tools and libraries