Skip to content

ecanbaykurt/HybridWall

Repository files navigation

ML A/B/C/D/E Experiment Framework

A comprehensive machine learning experiment framework for forecasting with five different models (LSTM, Transformer, Prophet, ARIMA, XGBoost) including compliance checks, uncertainty estimation, and cost-optimized deployment on Google Cloud Platform.

🎯 Overview

This framework implements a reproducible A/B/C/D/E experiment where:

  • Group A: LSTM (1.00% of dataset)
  • Group B: Lightweight Transformer (1.25% of dataset)
  • Group C: Prophet (1.25% of dataset)
  • Group D: ARIMA (1.25% of dataset)
  • Group E: XGBoost (1.25% of dataset)

Each model includes uncertainty estimation, explainability features, and compliance/governance checks for financial applications.

πŸ“ Project Structure

β”œβ”€β”€ model_lstm.py              # LSTM model implementation
β”œβ”€β”€ model_transformer.py       # Transformer model implementation
β”œβ”€β”€ model_prophet.py           # Prophet model implementation
β”œβ”€β”€ model_arima.py             # ARIMA model implementation
β”œβ”€β”€ model_xgboost.py           # XGBoost model implementation
β”œβ”€β”€ run_abtest.py              # Experiment orchestrator
β”œβ”€β”€ app.py                     # FastAPI service with compliance checks
β”œβ”€β”€ bigquery_sampling.sql      # SQL for creating sample groups
β”œβ”€β”€ requirements.txt           # Python dependencies
β”œβ”€β”€ Dockerfile                 # Multi-stage container build
β”œβ”€β”€ gce_startup.sh            # GCE startup script for preemptible instances
β”œβ”€β”€ tests/
β”‚   └── test_models.py         # Unit tests for all models
└── README.md                  # This file

πŸš€ Quick Start

1. Environment Setup

# Clone or download the project
git clone <repository-url>
cd ml-abtest

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

2. Configuration

Set the following environment variables:

export PROJECT_ID="your-gcp-project-id"
export BQ_DATASET="your_dataset"
export FEATURE_TABLE="your_table"
export GCS_BUCKET="your-bucket"
export ABTEST_SEED="42"
export SAMPLE_LIMIT_PER_GROUP="2000"
export AUDIT_SIGNING_KEY_SECRET="audit-signing-key"
export BUDGET_DOLLARS="10"
export PREEMPTIBLE="true"
export FULL_TRAIN="false"

3. BigQuery Setup

Run the sampling SQL to create experiment groups:

# Replace variables in the SQL file
sed -i "s/{PROJECT_ID}/$PROJECT_ID/g" bigquery_sampling.sql
sed -i "s/{BQ_DATASET}/$BQ_DATASET/g" bigquery_sampling.sql
sed -i "s/{FEATURE_TABLE}/$FEATURE_TABLE/g" bigquery_sampling.sql

# Execute the SQL
bq query --use_legacy_sql=false < bigquery_sampling.sql

4. Run Local Test

# Run unit tests
python -m pytest tests/ -v

# Run individual model test
python model_lstm.py

# Run full experiment locally (with synthetic data)
python run_abtest.py

πŸ’° Cost-Optimized Deployment

Budget Constraints (~$10)

The framework is designed for tight budgets with several cost-saving features:

  1. Preemptible Instances: Use preemptible GCE instances for training
  2. Sample Limiting: Default 2K samples per group (configurable)
  3. Checkpointing: Frequent saves to GCS to handle preemption
  4. Resource Limits: CPU-only training, minimal memory usage

GCE Preemptible Instance

# Create preemptible instance with startup script
gcloud compute instances create ml-abtest-instance \
    --zone=us-central1-a \
    --machine-type=e2-medium \
    --preemptible \
    --image-family=ubuntu-2004-lts \
    --image-project=ubuntu-os-cloud \
    --scopes=https://www.googleapis.com/auth/cloud-platform \
    --metadata-from-file startup-script=gce_startup.sh \
    --metadata PROJECT_ID=$PROJECT_ID,BQ_DATASET=$BQ_DATASET,GCS_BUCKET=$GCS_BUCKET

Cloud Run API Service

# Build and deploy API service
docker build --target api-minimal -t gcr.io/$PROJECT_ID/ml-abtest-api .
docker push gcr.io/$PROJECT_ID/ml-abtest-api

gcloud run deploy ml-abtest-api \
    --image gcr.io/$PROJECT_ID/ml-abtest-api \
    --platform managed \
    --region us-central1 \
    --memory 2Gi \
    --cpu 2 \
    --max-instances 10 \
    --allow-unauthenticated

πŸ”§ Usage

Running the Experiment

# Local execution with synthetic data
python run_abtest.py

# With BigQuery data (requires authentication)
export USE_BIGQUERY=true
python run_abtest.py

# Custom configuration
export SAMPLE_LIMIT_PER_GROUP=1000
export ABTEST_SEED=123
python run_abtest.py

API Usage

# Start API service
python app.py

# Test scoring endpoint
curl -X POST "http://localhost:8080/score" \
     -H "Content-Type: application/json" \
     -H "Authorization: Bearer your-token" \
     -d '{
       "to_address": "1A1zP1eP5QGefi2DMPTfTL5SLmv7DivfNa",
       "amount_btc": 0.001,
       "metadata": {
         "sender_country": "US",
         "receiver_country": "CA",
         "sender_vasp_id": "VASP123",
         "receiver_vasp_id": "VASP456"
       }
     }'

Health Check

curl http://localhost:8080/health

πŸ“Š Model Details

LSTM (Group A)

  • Architecture: 2-layer LSTM with dropout
  • Features: Time series sequences with trend/seasonality
  • Uncertainty: Monte Carlo Dropout
  • Use Case: Complex temporal patterns

Transformer (Group B)

  • Architecture: Lightweight transformer with positional encoding
  • Features: Multi-head attention with global pooling
  • Uncertainty: Monte Carlo Dropout
  • Use Case: Long-range dependencies

Prophet (Group C)

  • Architecture: Facebook Prophet with seasonal decomposition
  • Features: Trend, seasonality, holidays
  • Uncertainty: Built-in confidence intervals
  • Use Case: Business time series with seasonality

ARIMA (Group D)

  • Architecture: Auto-regressive integrated moving average
  • Features: Automatic order selection
  • Uncertainty: Prediction intervals
  • Use Case: Stationary time series

XGBoost (Group E)

  • Architecture: Gradient boosting with feature engineering
  • Features: Lag features, rolling statistics, derived features
  • Uncertainty: Feature-based variance estimation
  • Use Case: Tabular data with complex interactions

πŸ›‘οΈ Compliance & Governance

Sanctions Screening

  • Local SQLite database with OFAC/SDN data
  • Exact and fuzzy matching
  • Configurable risk weights

Travel Rule Compliance

  • Cross-border transaction detection
  • VASP metadata validation
  • Missing field identification

Policy Engine

  • Ensemble risk scoring: risk = w1*|z| + w2*sanctions*100 + w3*travel_rule*10 + w4*ml_risk
  • Configurable decision thresholds
  • Audit logging with HMAC signatures

Environment Variables for Policy

export RISK_WEIGHT_1="0.4"  # Z-score weight
export RISK_WEIGHT_2="0.3"  # Sanctions weight  
export RISK_WEIGHT_3="0.2"  # Travel rule weight
export RISK_WEIGHT_4="0.1"  # ML risk weight

πŸ“ˆ Monitoring & Logging

Audit Trail

All decisions are logged to BigQuery with:

  • Timestamp, request ID, address, amount
  • Model predictions and uncertainty
  • Policy decision and reasoning
  • HMAC-signed decision blob

Training Logs

  • Loss curves saved to GCS
  • Model artifacts with timestamps
  • Performance metrics (MAE, RMSE, F1, etc.)

Health Monitoring

  • API health checks
  • Model availability status
  • Compliance system status

πŸ§ͺ Testing

Unit Tests

# Run all tests
python -m pytest tests/ -v

# Run specific model test
python -m pytest tests/test_models.py::TestLSTMModel -v

# Run with coverage
python -m pytest tests/ --cov=. --cov-report=html

Integration Tests

# Test all models with same data
python -m pytest tests/test_models.py::TestModelIntegration -v

# Test API endpoints
python -m pytest tests/ -k "api" -v

πŸ“‹ 10-Step Quick Run Checklist

  1. Set up GCP project and authentication:

    gcloud auth login
    gcloud config set project $PROJECT_ID
  2. Create BigQuery dataset and tables:

    bq mk $BQ_DATASET
    bq query --use_legacy_sql=false < bigquery_sampling.sql
  3. Create GCS bucket:

    gsutil mb gs://$GCS_BUCKET
  4. Set up Secret Manager:

    echo "your-audit-signing-key" | gcloud secrets create audit-signing-key --data-file=-
  5. Build and push Docker image:

    docker build --target training -t gcr.io/$PROJECT_ID/ml-abtest .
    docker push gcr.io/$PROJECT_ID/ml-abtest
  6. Create preemptible GCE instance:

    gcloud compute instances create ml-abtest --zone=us-central1-a \
        --machine-type=e2-medium --preemptible \
        --metadata-from-file startup-script=gce_startup.sh
  7. Monitor training progress:

    gcloud compute ssh ml-abtest --zone=us-central1-a --command="tail -f /var/log/ml-abtest.log"
  8. Deploy API service:

    docker build --target api-minimal -t gcr.io/$PROJECT_ID/ml-abtest-api .
    docker push gcr.io/$PROJECT_ID/ml-abtest-api
    gcloud run deploy ml-abtest-api --image gcr.io/$PROJECT_ID/ml-abtest-api
  9. Test API endpoint:

    curl -X POST "https://ml-abtest-api-url/score" \
         -H "Content-Type: application/json" \
         -d '{"to_address": "test", "amount_btc": 0.001}'
  10. Check results in GCS:

    gsutil ls gs://$GCS_BUCKET/results/
    gsutil ls gs://$GCS_BUCKET/models/

πŸ”§ Configuration Options

Sampling Configuration

  • Modify SAMPLE_LIMIT_PER_GROUP to change dataset size per group
  • Adjust seeds in bigquery_sampling.sql for different random samples
  • Change sampling percentages in SQL (currently 1.00% for A, 1.25% for B-E)

Model Configuration

  • LSTM: Adjust sequence_length, hidden_size, num_layers
  • Transformer: Modify d_model, nhead, num_layers
  • Prophet: Change seasonality settings and priors
  • ARIMA: Enable/disable auto_arima or set custom orders
  • XGBoost: Adjust n_estimators, max_depth, feature engineering

Cost Optimization

  • Set BUDGET_DOLLARS to automatically adjust resource limits
  • Use PREEMPTIBLE=true for cost savings
  • Enable FULL_TRAIN=false for smaller training runs

🚨 Troubleshooting

Common Issues

  1. BigQuery Permission Errors:

    gcloud auth application-default login
    export GOOGLE_APPLICATION_CREDENTIALS="path/to/service-account.json"
  2. Memory Issues:

    • Reduce SAMPLE_LIMIT_PER_GROUP
    • Use smaller model architectures
    • Enable gradient checkpointing
  3. Preemption Handling:

    • Check GCS for partial results
    • Restart training from checkpoints
    • Use smaller batch sizes
  4. Model Loading Errors:

    • Verify model files exist in GCS
    • Check file permissions
    • Use correct model paths

Logs and Debugging

# Check training logs
gsutil cat gs://$GCS_BUCKET/logs/abtest_*.log

# Check instance logs
gcloud compute instances get-serial-port-output ml-abtest --zone=us-central1-a

# Check API logs
gcloud run logs read ml-abtest-api --region=us-central1

πŸ“š Dependencies

Core Requirements

  • Python 3.9+
  • PyTorch 2.0+ (CPU or GPU)
  • scikit-learn, pandas, numpy
  • Prophet, statsmodels, XGBoost
  • FastAPI, uvicorn
  • Google Cloud libraries

Optional Dependencies

  • GPU support for faster training
  • pmdarima for advanced ARIMA
  • tsfresh for feature extraction
  • arch for econometric models

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Add tests for new functionality
  4. Ensure all tests pass
  5. Submit a pull request

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ†˜ Support

For issues and questions:

  1. Check the troubleshooting section
  2. Review logs in GCS
  3. Run unit tests to verify setup
  4. Create an issue with detailed error information

Cost Note: This framework is optimized for a $10 budget. Actual costs may vary based on data size, training duration, and GCP pricing. Monitor costs using GCP billing alerts.

About

Sustainability Comp 2025 Boston University

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors