Skip to content

gangadharv444/PowerProphet

Repository files navigation

PowerProphet

ML-powered power outage risk prediction for Karnataka, India

Python XGBoost Next.js FastAPI MongoDB License

PowerProphet predicts power outage risks using XGBoost trained on historical utility records, combined with live weather data and real-time news heuristics.

Live Demo

Model Performance

Two separate XGBoost models are trained on 28,536 district-day records spanning from October 2019 to May 2026. A time-based train/test split is used, where models are trained on pre-2024 data and tested on 2024+.

Metric BESCOM Model GESCOM Model
AUC-ROC 0.978 0.950
Recall 0.98 0.85
F1-Score 0.98 0.78
Test Rows 1,704 5,112
Threshold 0.44 (F2-tuned for recall) 0.62 (PR-curve, recall >= 0.85)
Enhancements - SMOTE, lagged features, RandomizedSearchCV

How It Works

PowerProphet aggregates historical outage schedules from BESCOM and GESCOM reports using PDF and Excel parsing scripts. This data is combined with historical weather data from the Open-Meteo API and user-submitted outage reports stored in MongoDB Atlas to form a comprehensive training dataset. A FastAPI backend evaluates live weather and temporal features through the trained XGBoost models to calculate an outage risk probability. Concurrently, a background process fetches and scores Google News RSS feeds to detect real-time infrastructure emergencies, which are displayed on the Next.js frontend map.

Features

  • Predicts daily power outage risk probability per district using XGBoost.
  • Factors in historical weather (temp_c, rainfall_mm, wind_kmh) and temporal data (day_of_week, month, is_monsoon, is_weekend).
  • Incorporates rolling average lag features (e.g., outage_3day_rolling) for time-series context.
  • Fetches and scores real-time news alerts for BESCOM and GESCOM regions using Google News RSS.
  • Collects crowdsourced outage reports via the frontend to dynamically update lag features.
  • Displays risk scores and live news on an interactive MapLibre GL JS map.

Architecture

+------------------------+       +-------------------------+       +------------------------+
| Data Ingestion         |       | Backend (FastAPI)       |       | Frontend (Next.js)     |
|------------------------|       |-------------------------|       |------------------------|
| PDF & Excel Reports    | ----> | XGBoost Prediction      |       | MapLibre GL Rendering  |
| Open-Meteo API         |       | News RSS Fetcher        | <---- | Outage Reporting Form  |
| MongoDB Atlas          | ----> | Feature Engineering     | ----> | News Alerts Feed       |
+------------------------+       +-------------------------+       +------------------------+

Tech Stack

Category Technologies
Frontend Next.js 14 (App Router), React 18, Tailwind CSS, MapLibre GL JS, TypeScript
Backend Python 3.11, FastAPI, Uvicorn
Machine Learning XGBoost 2.0, scikit-learn, imbalanced-learn, pandas, numpy
Database MongoDB Atlas
Data Pipeline pdfplumber, openpyxl, Open-Meteo API
Deploy Vercel (frontend), Render (backend)

Project Structure

PowerProphet/
├── api/                   # FastAPI application
├── app/                   # Next.js frontend application
├── components/            # React components
├── data/                  # Processed datasets and caches
├── lib/                   # API clients and utilities
├── models/                # Saved XGBoost model binaries
├── public/                # Static assets
├── scripts/               # Data engineering and ML scripts
├── .env                   # Environment variables
├── package.json           # Node dependencies
└── requirements.txt       # Python dependencies

Local Setup

Backend

git clone https://github.com/gangadharv444/PowerProphet.git
cd PowerProphet

python -m venv .venv
# Windows:
.venv\Scripts\activate
# Mac/Linux:
source .venv/bin/activate

pip install -r requirements.txt

Create a .env file in the root directory:

MONGODB_URI=your_atlas_connection_string

Run the server:

python -m uvicorn api.main:app --reload --port 8000

Note about model file: models/outage_risk_model_v3.pkl is excluded from repo (size). Regenerate by running python scripts/train_model_v3.py (requires training_dataset_daily.csv in data/processed/).

Frontend

npm install
npm run dev

Open http://localhost:3000 in your browser.

API Endpoints

Method Endpoint Description
GET /health Check API status
POST /predict Get outage prediction for a district
POST /predict/batch Get batch predictions (max 50)
GET /districts List supported districts
POST /report-outage Submit a crowdsourced outage report
GET /news-alerts Fetch live scored news alerts

Data Pipeline

The data processing pipeline is handled by scripts in the /scripts directory:

  • parse_outage_pdfs.py: Extracts records from DISCOM PDFs.
  • clean_outages_xlsx.py: Normalizes and cleans official Excel schedules.
  • fetch_weather_openmeteo.py: Pulls historical weather per district.
  • fetch_news.py: Fetches and scores Google News RSS for BESCOM/GESCOM.
  • build_training_dataset.py: Merges outage and weather data into a final CSV.
  • train_model_v3.py: Trains both XGBoost models with lag features.
  • repack_model_v3.py: Verifies model consistency after repackaging.

Known Limitations

  • BESCOM training data is stronger; GESCOM has higher false negatives.
  • Lag features fall back to 0 if no prior crowdsource data exists.
  • Model trained on planned outage schedules — unplanned failures (e.g., sudden transformer burst) are harder to predict.
  • Only 8 of Karnataka's 31 districts are currently supported.

License

MIT

About

ML-powered power outage risk prediction system for Karnataka, India, built with FastAPI, Next.js, and XGBoost.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors