A practice repo covering portfolio optimization system combining GPU-accelerated machine learning (cuML), Bayesian regime detection, and stochastic programming for multi-asset allocation under market uncertainty.
- GPU-Accelerated Feature Engineering: 10x faster processing using RAPIDS cuDF/cuML
- Market Regime Detection: Hidden Markov Models + Bayesian changepoint detection
- Mathematical Optimization: Regime-aware mean-variance optimization with transaction costs
- Comprehensive Backtesting: Walk-forward validation with realistic assumptions
- GPU Computing: RAPIDS cuDF, cuML
- ML/AI: Scikit-learn, HMMLearn
- Optimization: Pyomo
- Bayesian: PyMC
Complete data pipeline for downloading and preprocessing financial market data for portfolio optimization.
Basic usage:
cd /src/data
python download_data.pyThis downloads:
- ~50 assets across equities, bonds, commodities, currencies
- ~20 macro indicators (GDP, inflation, unemployment, etc.)
- Sentiment data (VIX, market stress indicators)
Unit tests are provided in tests/data/test_data_download.py.
pytest tests/data/test_data_download.py -vGPU-accelerated technical indicator computation and feature engineering pipeline for portfolio optimization.
Prerequisites: Make sure you've completed the data ingestion module first:
# You should have these files from data ingestion
data/raw/asset_prices.csv
data/raw/macro_data.csv (optional)
data/raw/sentiment_data.csv (optional)Basic usage:
cd /src/feature_engineering
python technical_indicators.pyFeatures computed:
- Daily returns (1-day)
- Multi-period returns (5d, 21d, 63d)
- Log returns (better for statistical modeling)
- SMA (Simple Moving Average): 20, 50, 200-day windows
- EMA (Exponential Moving Average): 12, 26, 50-day spans
- RSI (Relative Strength Index): 14-day window
- Range: 0-100
-
70 = Overbought, <30 = Oversold
- MACD (Moving Average Convergence Divergence):
- MACD Line (EMA12 - EMA26)
- Signal Line (EMA9 of MACD)
- Histogram (MACD - Signal)
- Momentum: 10, 20, 50-day price momentum
- Rolling Volatility: 20, 60, 252-day windows (annualized)
- Bollinger Bands:
- Upper/Lower bands (±2σ)
- Bandwidth
- %B (position within bands)
- Average True Range (ATR): only if columns
High,LowandAdj Closeare present
- Rolling correlation with benchmark (SPY)
- 60-day window
- Lagged returns: 1, 5, 21-day lags
- Captures momentum and mean reversion
- Rank: Percentile rank within universe
- Z-score: Standardized returns
- Deviation from mean: Relative performance
- Pair-wise interactions/products (e.g.,
sma_20andema_12)
- Interest rates, inflation, employment
- GDP growth, consumer sentiment
- Credit spreads, money supply
- VIX, SKEW, volatility indices
Unit tests are provided in tests/feature_engineering/test_feature_engineering.py.
Advanced market regime detection using multiple methodologies: volatility analysis, clustering, Hidden Markov Models (HMM), and Bayesian changepoint detection.
Regime detectors:
Simple but effective method based on rolling volatility quantiles.
How it works:
- Calculates rolling volatility across all assets
- Divides into regimes based on quantile thresholds
- Identifies: Low Vol, Normal, High Vol, Crisis
Vol_t = σ(r_{t-w:t})
Regime_t = Quantile_bin(Vol_t)
Uses K-Means clustering on market features to identify regimes.
Features used:
- Market returns (equal-weighted)
- Rolling volatility
- Average correlation
- Return dispersion (cross-sectional std)
How it works:
- Compute rolling market features
- Standardize features
- Optional: PCA for dimensionality reduction
- K-Means clustering (GPU-accelerated with cuML)
- Assign regime names based on characteristics
min Σ ||x_i - μ_{c(i)}||^2
subject to: c(i) ∈ {1,...,K}
Hidden Markov Model (HMM)
Statistical model that assumes market states are "hidden" and inferred from observed data.
How it works:
- Assumes market evolves through hidden states
- Observes returns and volatility
- Uses Expectation-Maximization (EM) to learn:
- Transition probabilities between states
- Emission distributions (what each state looks like)
- Viterbi algorithm finds most likely state sequence
P(s_t | s_{t-1}) = Transition matrix
P(x_t | s_t) = Emission distribution
Uses Bayesian inference to detect structural breaks and regime changes.
How it works:
- Places priors on changepoint locations
- Places priors on regime means and variances
- Uses MCMC (Markov Chain Monte Carlo) sampling
- Posterior distribution gives uncertainty estimates
τ ~ Uniform(T_min, T_max)
μ_k ~ Normal(0, σ_μ)
σ_k ~ HalfNormal(σ_σ)
Unit tests are provided in tests/regime_detection/test_regime_detection.py.
Advanced portfolio optimization using Pyomo for mathematical modeling with GPU-accelerated covariance computation. Implements multiple optimization strategies including regime-aware allocation.
Prerequisites: Install IPOPT solver first (instructions for Mac OS):
brew install ipoptOptimization methods:
Classic portfolio optimization balancing return and risk.
Objective:
Maximize: E[R] - λ * Var[R]
Finds the portfolio with minimum risk, regardless of return.
Objective:
Minimize: Var[R]
Finds the portfolio with the best risk-adjusted return.
Objective:
Maximize: (E[R] - rf) / σ[R]
Implementation Note: Solved via quadratic reformulation:
Minimize: w'Σw subject to (μ - rf)'w = 1
Then normalize: w_final = w / sum(w)
Equalizes risk contribution from each asset.
Concept:
Risk Contribution_i = w_i * (Σw)_i / σ_p
Goal: RC_1 = RC_2 = ... = RC_n
Implementation: Simplified inverse volatility weighting
w_i ∝ 1/σ_i
Optimizes considering multiple market regimes and their probabilities.
Objective:
Maximize: Σ_r P(regime=r) * [E[R|r] - λ*Var[R|r]] - TC*|Δw|
Unit tests are provided in tests/optimization/test_portfolio_optimization.py.
- Multi-Regime Forecasting: Separate ML models (with cuML support) per market state
- Stochastic Optimization: Use ML forecasting to generate scenarios (from bearish to bullish) for optimization
- Visualization: Dashboards nad graphical user interfaces (Plotly, Streamlit)
- Clone the repository
git git@github.com:bacalfa/gpu-port-opt.git
cd gpu-port-opt- Create virtual or a conda environment (example using
uv)
uv venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate- Install dependencies (example using
uv)
uv sync- Create file
.envin top folder and containing the following text (replace API keys with yours)
# FRED API Key from https://fred.stlouisfed.org/docs/api/api_key.html
FRED_API_KEY=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx