ML A/B/C/D/E Experiment Framework

A comprehensive machine learning experiment framework for forecasting with five different models (LSTM, Transformer, Prophet, ARIMA, XGBoost) including compliance checks, uncertainty estimation, and cost-optimized deployment on Google Cloud Platform.

🎯 Overview

This framework implements a reproducible A/B/C/D/E experiment where:

Group A: LSTM (1.00% of dataset)
Group B: Lightweight Transformer (1.25% of dataset)
Group C: Prophet (1.25% of dataset)
Group D: ARIMA (1.25% of dataset)
Group E: XGBoost (1.25% of dataset)

Each model includes uncertainty estimation, explainability features, and compliance/governance checks for financial applications.

📁 Project Structure

├── model_lstm.py              # LSTM model implementation
├── model_transformer.py       # Transformer model implementation
├── model_prophet.py           # Prophet model implementation
├── model_arima.py             # ARIMA model implementation
├── model_xgboost.py           # XGBoost model implementation
├── run_abtest.py              # Experiment orchestrator
├── app.py                     # FastAPI service with compliance checks
├── bigquery_sampling.sql      # SQL for creating sample groups
├── requirements.txt           # Python dependencies
├── Dockerfile                 # Multi-stage container build
├── gce_startup.sh            # GCE startup script for preemptible instances
├── tests/
│   └── test_models.py         # Unit tests for all models
└── README.md                  # This file

🚀 Quick Start

1. Environment Setup

# Clone or download the project
git clone <repository-url>
cd ml-abtest

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

2. Configuration

Set the following environment variables:

export PROJECT_ID="your-gcp-project-id"
export BQ_DATASET="your_dataset"
export FEATURE_TABLE="your_table"
export GCS_BUCKET="your-bucket"
export ABTEST_SEED="42"
export SAMPLE_LIMIT_PER_GROUP="2000"
export AUDIT_SIGNING_KEY_SECRET="audit-signing-key"
export BUDGET_DOLLARS="10"
export PREEMPTIBLE="true"
export FULL_TRAIN="false"

3. BigQuery Setup

Run the sampling SQL to create experiment groups:

# Replace variables in the SQL file
sed -i "s/{PROJECT_ID}/$PROJECT_ID/g" bigquery_sampling.sql
sed -i "s/{BQ_DATASET}/$BQ_DATASET/g" bigquery_sampling.sql
sed -i "s/{FEATURE_TABLE}/$FEATURE_TABLE/g" bigquery_sampling.sql

# Execute the SQL
bq query --use_legacy_sql=false < bigquery_sampling.sql

4. Run Local Test

# Run unit tests
python -m pytest tests/ -v

# Run individual model test
python model_lstm.py

# Run full experiment locally (with synthetic data)
python run_abtest.py

💰 Cost-Optimized Deployment

Budget Constraints (~$10)

The framework is designed for tight budgets with several cost-saving features:

Preemptible Instances: Use preemptible GCE instances for training
Sample Limiting: Default 2K samples per group (configurable)
Checkpointing: Frequent saves to GCS to handle preemption
Resource Limits: CPU-only training, minimal memory usage

GCE Preemptible Instance

# Create preemptible instance with startup script
gcloud compute instances create ml-abtest-instance \
    --zone=us-central1-a \
    --machine-type=e2-medium \
    --preemptible \
    --image-family=ubuntu-2004-lts \
    --image-project=ubuntu-os-cloud \
    --scopes=https://www.googleapis.com/auth/cloud-platform \
    --metadata-from-file startup-script=gce_startup.sh \
    --metadata PROJECT_ID=$PROJECT_ID,BQ_DATASET=$BQ_DATASET,GCS_BUCKET=$GCS_BUCKET

Cloud Run API Service

# Build and deploy API service
docker build --target api-minimal -t gcr.io/$PROJECT_ID/ml-abtest-api .
docker push gcr.io/$PROJECT_ID/ml-abtest-api

gcloud run deploy ml-abtest-api \
    --image gcr.io/$PROJECT_ID/ml-abtest-api \
    --platform managed \
    --region us-central1 \
    --memory 2Gi \
    --cpu 2 \
    --max-instances 10 \
    --allow-unauthenticated

🔧 Usage

Running the Experiment

# Local execution with synthetic data
python run_abtest.py

# With BigQuery data (requires authentication)
export USE_BIGQUERY=true
python run_abtest.py

# Custom configuration
export SAMPLE_LIMIT_PER_GROUP=1000
export ABTEST_SEED=123
python run_abtest.py

API Usage

# Start API service
python app.py

# Test scoring endpoint
curl -X POST "http://localhost:8080/score" \
     -H "Content-Type: application/json" \
     -H "Authorization: Bearer your-token" \
     -d '{
       "to_address": "1A1zP1eP5QGefi2DMPTfTL5SLmv7DivfNa",
       "amount_btc": 0.001,
       "metadata": {
         "sender_country": "US",
         "receiver_country": "CA",
         "sender_vasp_id": "VASP123",
         "receiver_vasp_id": "VASP456"
       }
     }'

Health Check

curl http://localhost:8080/health

📊 Model Details

LSTM (Group A)

Architecture: 2-layer LSTM with dropout
Features: Time series sequences with trend/seasonality
Uncertainty: Monte Carlo Dropout
Use Case: Complex temporal patterns

Transformer (Group B)

Architecture: Lightweight transformer with positional encoding
Features: Multi-head attention with global pooling
Uncertainty: Monte Carlo Dropout
Use Case: Long-range dependencies

Prophet (Group C)

Architecture: Facebook Prophet with seasonal decomposition
Features: Trend, seasonality, holidays
Uncertainty: Built-in confidence intervals
Use Case: Business time series with seasonality

ARIMA (Group D)

Architecture: Auto-regressive integrated moving average
Features: Automatic order selection
Uncertainty: Prediction intervals
Use Case: Stationary time series

XGBoost (Group E)

Architecture: Gradient boosting with feature engineering
Features: Lag features, rolling statistics, derived features
Uncertainty: Feature-based variance estimation
Use Case: Tabular data with complex interactions

🛡️ Compliance & Governance

Sanctions Screening

Local SQLite database with OFAC/SDN data
Exact and fuzzy matching
Configurable risk weights

Travel Rule Compliance

Cross-border transaction detection
VASP metadata validation
Missing field identification

Policy Engine

Ensemble risk scoring: risk = w1*|z| + w2*sanctions*100 + w3*travel_rule*10 + w4*ml_risk
Configurable decision thresholds
Audit logging with HMAC signatures

Environment Variables for Policy

export RISK_WEIGHT_1="0.4"  # Z-score weight
export RISK_WEIGHT_2="0.3"  # Sanctions weight  
export RISK_WEIGHT_3="0.2"  # Travel rule weight
export RISK_WEIGHT_4="0.1"  # ML risk weight

📈 Monitoring & Logging

Audit Trail

All decisions are logged to BigQuery with:

Timestamp, request ID, address, amount
Model predictions and uncertainty
Policy decision and reasoning
HMAC-signed decision blob

Training Logs

Loss curves saved to GCS
Model artifacts with timestamps
Performance metrics (MAE, RMSE, F1, etc.)

Health Monitoring

API health checks
Model availability status
Compliance system status

🧪 Testing

Unit Tests

# Run all tests
python -m pytest tests/ -v

# Run specific model test
python -m pytest tests/test_models.py::TestLSTMModel -v

# Run with coverage
python -m pytest tests/ --cov=. --cov-report=html

Integration Tests

# Test all models with same data
python -m pytest tests/test_models.py::TestModelIntegration -v

# Test API endpoints
python -m pytest tests/ -k "api" -v

📋 10-Step Quick Run Checklist

Set up GCP project and authentication:

gcloud auth login
gcloud config set project $PROJECT_ID

Create BigQuery dataset and tables:

bq mk $BQ_DATASET
bq query --use_legacy_sql=false < bigquery_sampling.sql

Create GCS bucket:
```
gsutil mb gs://$GCS_BUCKET
```

Set up Secret Manager:

echo "your-audit-signing-key" | gcloud secrets create audit-signing-key --data-file=-

Build and push Docker image:

docker build --target training -t gcr.io/$PROJECT_ID/ml-abtest .
docker push gcr.io/$PROJECT_ID/ml-abtest

Create preemptible GCE instance:

gcloud compute instances create ml-abtest --zone=us-central1-a \
    --machine-type=e2-medium --preemptible \
    --metadata-from-file startup-script=gce_startup.sh

Monitor training progress:

gcloud compute ssh ml-abtest --zone=us-central1-a --command="tail -f /var/log/ml-abtest.log"

Deploy API service:

docker build --target api-minimal -t gcr.io/$PROJECT_ID/ml-abtest-api .
docker push gcr.io/$PROJECT_ID/ml-abtest-api
gcloud run deploy ml-abtest-api --image gcr.io/$PROJECT_ID/ml-abtest-api

Test API endpoint:

curl -X POST "https://ml-abtest-api-url/score" \
     -H "Content-Type: application/json" \
     -d '{"to_address": "test", "amount_btc": 0.001}'

Check results in GCS:

gsutil ls gs://$GCS_BUCKET/results/
gsutil ls gs://$GCS_BUCKET/models/

🔧 Configuration Options

Sampling Configuration

Modify SAMPLE_LIMIT_PER_GROUP to change dataset size per group
Adjust seeds in bigquery_sampling.sql for different random samples
Change sampling percentages in SQL (currently 1.00% for A, 1.25% for B-E)

Model Configuration

LSTM: Adjust sequence_length, hidden_size, num_layers
Transformer: Modify d_model, nhead, num_layers
Prophet: Change seasonality settings and priors
ARIMA: Enable/disable auto_arima or set custom orders
XGBoost: Adjust n_estimators, max_depth, feature engineering

Cost Optimization

Set BUDGET_DOLLARS to automatically adjust resource limits
Use PREEMPTIBLE=true for cost savings
Enable FULL_TRAIN=false for smaller training runs

🚨 Troubleshooting

Common Issues

BigQuery Permission Errors:

gcloud auth application-default login
export GOOGLE_APPLICATION_CREDENTIALS="path/to/service-account.json"

Memory Issues:
- Reduce SAMPLE_LIMIT_PER_GROUP
- Use smaller model architectures
- Enable gradient checkpointing
Preemption Handling:
- Check GCS for partial results
- Restart training from checkpoints
- Use smaller batch sizes
Model Loading Errors:
- Verify model files exist in GCS
- Check file permissions
- Use correct model paths

Logs and Debugging

# Check training logs
gsutil cat gs://$GCS_BUCKET/logs/abtest_*.log

# Check instance logs
gcloud compute instances get-serial-port-output ml-abtest --zone=us-central1-a

# Check API logs
gcloud run logs read ml-abtest-api --region=us-central1

📚 Dependencies

Core Requirements

Python 3.9+
PyTorch 2.0+ (CPU or GPU)
scikit-learn, pandas, numpy
Prophet, statsmodels, XGBoost
FastAPI, uvicorn
Google Cloud libraries

Optional Dependencies

GPU support for faster training
pmdarima for advanced ARIMA
tsfresh for feature extraction
arch for econometric models

🤝 Contributing

Fork the repository
Create a feature branch
Add tests for new functionality
Ensure all tests pass
Submit a pull request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🆘 Support

For issues and questions:

Check the troubleshooting section
Review logs in GCS
Run unit tests to verify setup
Create an issue with detailed error information

Cost Note: This framework is optimized for a $10 budget. Actual costs may vary based on data size, training duration, and GCP pricing. Monitor costs using GCP billing alerts.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
tests		tests
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
README_BEGINNER.md		README_BEGINNER.md
app.py		app.py
bigquery_sampling.sql		bigquery_sampling.sql
gce_startup.sh		gce_startup.sh
model_arima.py		model_arima.py
model_lstm.py		model_lstm.py
model_prophet.py		model_prophet.py
model_transformer.py		model_transformer.py
model_xgboost.py		model_xgboost.py
proposal.md		proposal.md
requirements.txt		requirements.txt
run_abtest.py		run_abtest.py

Folders and files

Latest commit

History

Repository files navigation

ML A/B/C/D/E Experiment Framework

🎯 Overview

📁 Project Structure

🚀 Quick Start

1. Environment Setup

2. Configuration

3. BigQuery Setup

4. Run Local Test

💰 Cost-Optimized Deployment

Budget Constraints (~$10)

GCE Preemptible Instance

Cloud Run API Service

🔧 Usage

Running the Experiment

API Usage

Health Check

📊 Model Details

LSTM (Group A)

Transformer (Group B)

Prophet (Group C)

ARIMA (Group D)

XGBoost (Group E)

🛡️ Compliance & Governance

Sanctions Screening

Travel Rule Compliance

Policy Engine

Environment Variables for Policy

📈 Monitoring & Logging

Audit Trail

Training Logs

Health Monitoring

🧪 Testing

Unit Tests

Integration Tests

📋 10-Step Quick Run Checklist

🔧 Configuration Options

Sampling Configuration

Model Configuration

Cost Optimization

🚨 Troubleshooting

Common Issues

Logs and Debugging

📚 Dependencies

Core Requirements

Optional Dependencies

🤝 Contributing

📄 License

🆘 Support

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages