ML-powered power outage risk prediction for Karnataka, India
PowerProphet predicts power outage risks using XGBoost trained on historical utility records, combined with live weather data and real-time news heuristics.
Two separate XGBoost models are trained on 28,536 district-day records spanning from October 2019 to May 2026. A time-based train/test split is used, where models are trained on pre-2024 data and tested on 2024+.
| Metric | BESCOM Model | GESCOM Model |
|---|---|---|
| AUC-ROC | 0.978 | 0.950 |
| Recall | 0.98 | 0.85 |
| F1-Score | 0.98 | 0.78 |
| Test Rows | 1,704 | 5,112 |
| Threshold | 0.44 (F2-tuned for recall) | 0.62 (PR-curve, recall >= 0.85) |
| Enhancements | - | SMOTE, lagged features, RandomizedSearchCV |
PowerProphet aggregates historical outage schedules from BESCOM and GESCOM reports using PDF and Excel parsing scripts. This data is combined with historical weather data from the Open-Meteo API and user-submitted outage reports stored in MongoDB Atlas to form a comprehensive training dataset. A FastAPI backend evaluates live weather and temporal features through the trained XGBoost models to calculate an outage risk probability. Concurrently, a background process fetches and scores Google News RSS feeds to detect real-time infrastructure emergencies, which are displayed on the Next.js frontend map.
- Predicts daily power outage risk probability per district using XGBoost.
- Factors in historical weather (temp_c, rainfall_mm, wind_kmh) and temporal data (day_of_week, month, is_monsoon, is_weekend).
- Incorporates rolling average lag features (e.g., outage_3day_rolling) for time-series context.
- Fetches and scores real-time news alerts for BESCOM and GESCOM regions using Google News RSS.
- Collects crowdsourced outage reports via the frontend to dynamically update lag features.
- Displays risk scores and live news on an interactive MapLibre GL JS map.
+------------------------+ +-------------------------+ +------------------------+
| Data Ingestion | | Backend (FastAPI) | | Frontend (Next.js) |
|------------------------| |-------------------------| |------------------------|
| PDF & Excel Reports | ----> | XGBoost Prediction | | MapLibre GL Rendering |
| Open-Meteo API | | News RSS Fetcher | <---- | Outage Reporting Form |
| MongoDB Atlas | ----> | Feature Engineering | ----> | News Alerts Feed |
+------------------------+ +-------------------------+ +------------------------+
| Category | Technologies |
|---|---|
| Frontend | Next.js 14 (App Router), React 18, Tailwind CSS, MapLibre GL JS, TypeScript |
| Backend | Python 3.11, FastAPI, Uvicorn |
| Machine Learning | XGBoost 2.0, scikit-learn, imbalanced-learn, pandas, numpy |
| Database | MongoDB Atlas |
| Data Pipeline | pdfplumber, openpyxl, Open-Meteo API |
| Deploy | Vercel (frontend), Render (backend) |
PowerProphet/
├── api/ # FastAPI application
├── app/ # Next.js frontend application
├── components/ # React components
├── data/ # Processed datasets and caches
├── lib/ # API clients and utilities
├── models/ # Saved XGBoost model binaries
├── public/ # Static assets
├── scripts/ # Data engineering and ML scripts
├── .env # Environment variables
├── package.json # Node dependencies
└── requirements.txt # Python dependencies
git clone https://github.com/gangadharv444/PowerProphet.git
cd PowerProphet
python -m venv .venv
# Windows:
.venv\Scripts\activate
# Mac/Linux:
source .venv/bin/activate
pip install -r requirements.txtCreate a .env file in the root directory:
MONGODB_URI=your_atlas_connection_stringRun the server:
python -m uvicorn api.main:app --reload --port 8000Note about model file: models/outage_risk_model_v3.pkl is excluded from repo (size). Regenerate by running python scripts/train_model_v3.py (requires training_dataset_daily.csv in data/processed/).
npm install
npm run devOpen http://localhost:3000 in your browser.
| Method | Endpoint | Description |
|---|---|---|
GET |
/health |
Check API status |
POST |
/predict |
Get outage prediction for a district |
POST |
/predict/batch |
Get batch predictions (max 50) |
GET |
/districts |
List supported districts |
POST |
/report-outage |
Submit a crowdsourced outage report |
GET |
/news-alerts |
Fetch live scored news alerts |
The data processing pipeline is handled by scripts in the /scripts directory:
parse_outage_pdfs.py: Extracts records from DISCOM PDFs.clean_outages_xlsx.py: Normalizes and cleans official Excel schedules.fetch_weather_openmeteo.py: Pulls historical weather per district.fetch_news.py: Fetches and scores Google News RSS for BESCOM/GESCOM.build_training_dataset.py: Merges outage and weather data into a final CSV.train_model_v3.py: Trains both XGBoost models with lag features.repack_model_v3.py: Verifies model consistency after repackaging.
- BESCOM training data is stronger; GESCOM has higher false negatives.
- Lag features fall back to 0 if no prior crowdsource data exists.
- Model trained on planned outage schedules — unplanned failures (e.g., sudden transformer burst) are harder to predict.
- Only 8 of Karnataka's 31 districts are currently supported.
MIT