Skip to content

goodmorningsaksham/oasis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

12 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

title OASIS
emoji πŸ’‰
colorFrom blue
colorTo green
sdk docker
pinned false
app_port 8000
tags
openenv

πŸ₯ OASIS

Optimized Adaptive System for Insulin Scheduling

An OpenEnv reinforcement learning environment for training AI agents to manage insulin dosing in Type 1 Diabetes

OpenEnv Python simglucose Tests License

4 tasks Β· 19-field observation space Β· gamma-CDF pharmacokinetics Β· live interactive dashboard

Live Demo Β· API Docs Β· Quick Start


Why This Matters

Type 1 Diabetes affects over 9 million people worldwide. These patients produce zero insulin and must deliver it externally every few minutes β€” a decision that is unforgiving in both directions:

Condition Glucose Level Consequence
⚠️ Mild Hyperglycemia > 180 mg/dL Progressive organ damage over months
πŸ”΄ Severe Hyperglycemia > 250 mg/dL Diabetic ketoacidosis β€” emergency
⚠️ Mild Hypoglycemia < 70 mg/dL Confusion, tremors, impaired function
πŸ’€ Severe Hypoglycemia < 54 mg/dL Seizures, loss of consciousness, death within minutes

The clinical gold standard is Time-in-Range (TIR): the percentage of time glucose stays within 70–180 mg/dL. Guidelines recommend β‰₯70%. Most patients achieve far less.

Current commercial insulin pumps use PID controllers β€” rule-based systems tuned for an "average" patient. But no patient is average. A child may be 5Γ— more insulin-sensitive than an adult. Exercise changes sensitivity unpredictably. Illness causes resistance. These controllers fail silently, and patients pay the price.

OASIS exists to train RL agents that adapt where PID controllers cannot.


What Makes OASIS Different

OASIS is not a toy environment. Every design decision is grounded in clinical physiology:

Feature Implementation Clinical Basis
FDA-accepted simulator UVa/Padova T1D model via simglucose Gold standard for in-silico T1D research
30 virtual patients Adolescents, adults, children with distinct physiology Real inter-patient variability
CGM noise Οƒ=10 mg/dL Gaussian on subcutaneous glucose (Gsub) ISO 15197 accuracy standard
Gamma-CDF pharmacokinetics IOB modelled with gamma distribution (peak 55 min, clear 8 hrs) Rapid-acting insulin absorption profile (Lispro/Aspart)
Exercise physiology 20–70% insulin sensitivity increase during activity Skeletal muscle glucose transport
Illness simulation 1.5–2.5Γ— insulin resistance at unknown onset Inflammatory insulin receptor downregulation
Asymmetric reward Hypo penalised 2–6Γ— heavier than hyper Acute vs. cumulative clinical risk
Recovery bonus +0.5 for hypo correction, +0.3 for hyper within 10 steps Incentivises active clinical management

The RL Problem

At each 3-minute step, the agent observes a 19-field clinical state and outputs a 2D continuous action:

        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β”‚           OBSERVATION (19 fields)        β”‚
        β”‚                                          β”‚
        β”‚  CGM glucose (noisy) ──────── 142.3 mg/dLβ”‚
        β”‚  Glucose trend ──────────────── rising   β”‚
        β”‚  12-reading history window ─── [138, ...] β”‚
        β”‚  Meal announced? ──────────────── Yes    β”‚
        β”‚  Meal carbs ───────────────────── 70g    β”‚
        β”‚  Exercise intensity ────────────── 0.0   β”‚
        β”‚  Insulin-on-board (gamma-CDF) ── 2.4 U  β”‚
        β”‚  Time of day ──────────────────── 10.0 h β”‚
        β”‚  ... and 11 more fields                  β”‚
        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                         β”‚
                    β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”
                    β”‚  AGENT  β”‚
                    β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜
                         β”‚
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β”‚            ACTION (2 fields)             β”‚
        β”‚                                          β”‚
        β”‚  Basal rate ──────── 1.2 U/hr (0.0–5.0) β”‚
        β”‚  Bolus dose ──────── 5.0 U   (0.0–20.0) β”‚
        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                         β”‚
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β”‚         REWARD (6 components)            β”‚
        β”‚                                          β”‚
        β”‚  In-range bonus ──────────────── +1.0    β”‚
        β”‚  Hypo penalty ────────── -1.0 to -3.0   β”‚
        β”‚  Hyper penalty ──────── -0.5 to -1.5    β”‚
        β”‚  Overdose penalty ──────────── -3.0      β”‚
        β”‚  Recovery bonus ─────── +0.3 to +0.5    β”‚
        β”‚  Step total ──────────── sum of above    β”‚
        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Four Tasks β€” Escalating Real-World Difficulty

Task Name Difficulty Patient Meals Exercise Illness
1 Basal Rate Control 🟒 Easy adult#001 None None None
2 Meal Bolus Timing 🟑 Medium adult#001 3 announced Announced None
3 Cross-Patient Generalisation πŸ”΄ Hard Random/30 3 unannounced Random None
4 Sick Day Management ⚫ Expert Random/30 3 unannounced Random 1.5–2.5Γ— resistance

Task 1 establishes baseline control β€” keep glucose stable with basal insulin only.

Task 2 introduces meal management. Three daily meals (50g/70g/80g CHO) are announced 30 minutes ahead. The agent must learn pre-meal bolus timing. A moderate exercise event at step 150 (also announced) tests exercise-aware dosing.

Task 3 tests generalisation. A random patient from 30 profiles β€” children who are 5Γ— more sensitive, adults, adolescents. Meals and exercise are unannounced. Patient identity is hidden. The agent must infer physiology from glucose dynamics alone.

Task 4 is genuinely frontier-level. A random patient develops illness causing 1.5–2.5Γ— insulin resistance at an unknown time. The agent is never told. It must detect rising glucose despite normal dosing, infer that insulin has become less effective, and increase delivery without over-correcting. PID controllers fail catastrophically on this task β€” an RL agent that succeeds here would represent a clinically meaningful advance.


Baseline Scores

All results deterministic (seed=42), reproducible via python eval.py:

Agent Task 1 Task 2 Task 3 Task 4
Constant Basal (no intelligence) 1.000 0.000 0.345 ~0.050
PID Controller (clinical standard) 1.000 0.736 0.206 ~0.120
Target: Good RL Agent β‰₯ 0.95 β‰₯ 0.70 β‰₯ 0.60 β‰₯ 0.45

Key insight from Task 3: The PID controller scores worse than constant basal (0.206 vs 0.345) because its adult-tuned aggressive corrections cause fatal hypoglycemia in 4 of 5 child/adolescent patients. A "smarter" fixed controller is more dangerous than a conservative one when patient physiology varies. This is exactly why adaptive RL agents are needed.


Observation Space (19 fields)

Field Type Description
glucose_mg_dl float CGM reading with ISO 15197 noise (Οƒ=10 mg/dL)
glucose_trend string rapidly_falling / falling / stable / rising / rapidly_rising
glucose_history_window list[float] Last 12 CGM readings (36 min context)
meal_announced bool Meal within 30 min (Task 2 only)
meal_grams_announced float Carbs in announced meal
exercise_intensity float Current exercise (0=rest, 1=max)
exercise_announced bool Exercise within 30 min (Task 2 only)
insulin_on_board_units float Active insulin via gamma-CDF PK model
time_of_day_hours float Simulated time (0.0–24.0)
step int Current step (0–479)
patient_id string/null Hidden in Task 3/4
last_action_basal float Previous basal rate
last_action_bolus float Previous bolus dose
true_glucose_mg_dl float/null Pre-noise glucose (research/debug)
illness_active bool Debug only β€” always False in normal mode

Reward Function (6 components)

Glucose Zone Component Value Rationale
70–180 mg/dL TIR contribution +1.0 Target range
54–70 mg/dL Hypo penalty βˆ’1.0 Dangerous
< 54 mg/dL Severe hypo βˆ’3.0 Life-threatening
180–250 mg/dL Hyper penalty βˆ’0.5 Long-term damage
> 250 mg/dL Severe hyper βˆ’1.5 Acute risk
< 54 + recent bolus Overdose βˆ’3.0 Prevents reward hacking
Hypo corrected ≀10 steps Recovery bonus +0.5 Rewards active correction
Hyper corrected ≀10 steps Recovery bonus +0.3 Rewards proactive management

Insulin-on-Board: Gamma-CDF Pharmacokinetic Model

Unlike simple exponential decay models, OASIS uses a gamma-distribution cumulative absorption curve matching the pharmacokinetic profile of rapid-acting insulin (Lispro/Aspart/Fiasp):

IOB(t) = Ξ£ insulin_dose[i] Γ— (1 βˆ’ FΞ³(t βˆ’ t_injection[i]))

Where FΞ³ is the gamma CDF with shape k=2, peak at 55 minutes. The model tracks 160 steps (8 hours) of insulin delivery history β€” both basal and bolus β€” and computes the fraction NOT YET absorbed at each time offset. This produces realistic IOB curves: a 10U bolus shows 10.0U immediately, 6.3U at 90 minutes, 2.4U at 4 hours.

Commercial artificial pancreas systems (Medtronic 780G, Tandem Control-IQ, Omnipod 5) display IOB as a primary safety signal to prevent bolus stacking. OASIS gives RL agents the same information.


Quick Start

Local Development

git clone https://github.com/saksham1771/glucorl.git
cd glucorl
pip install -r requirements.txt
uvicorn server.app:app --port 8000

# Open the interactive dashboard
open http://localhost:8000

Docker

docker build -t oasis .
docker run -p 8000:8000 oasis

Python Client

from client import GlucoEnv
from models import GlucoAction

with GlucoEnv(base_url="http://localhost:8000") as env:
    result = env.reset(task_id=2)
    while not result.done:
        obs = result.observation
        action = GlucoAction(
            basal_rate=1.2,
            bolus_dose=5.0 if obs.meal_announced else 0.0
        )
        result = env.step(action)
    state = env.state()
    print(f"TIR: {state.tir_current:.1%}")

Run Inference

export GLUCORL_ENV_URL="http://localhost:8000"
export API_BASE_URL="https://router.huggingface.co/v1"
export HF_TOKEN="hf_your_token"
export MODEL_NAME="meta-llama/Llama-3.1-8B-Instruct"
python inference.py

API Reference

Method Endpoint Description
GET / Interactive web dashboard (WebSocket-based, real-time)
POST /reset Start episode. Body: {"task_id": 1} (1–4)
POST /step Take action. Body: {"basal_rate": 1.0, "bolus_dose": 0.0}
GET /state Full episode state with glucose history and metrics
GET /tasks List all 4 tasks with descriptions
POST /grade Detailed decomposed score breakdown
GET /health Health check
WS /ws WebSocket for persistent sessions
GET /docs Swagger API documentation

Training with RL

OASIS is designed for GRPO training via TRL:

  • Dense reward: every step produces signal (+1.0 to βˆ’6.0 range)
  • Continuous action space: 2D (basal + bolus) β€” amenable to policy gradient methods
  • 480-step episodes: long enough for meaningful trajectories, short enough for fast iteration
  • 4-task curriculum: natural difficulty progression for progressive training
  • Reward variance: successful episodes score +300 to +480, failed episodes βˆ’200 to βˆ’500 β€” GRPO needs this spread

The glucose_history_window (12 readings) enables feedforward agents to reason temporally without RNN architectures. Full history is available via /state for agents that prefer complete episode context.


Project Structure

oasis/
β”œβ”€β”€ inference.py                  # Baseline inference (OpenAI client)
β”œβ”€β”€ models.py                     # Pydantic: Action, Observation, State, Reward
β”œβ”€β”€ client.py                     # WebSocket client (EnvClient)
β”œβ”€β”€ eval.py                       # PID vs baseline benchmark
β”œβ”€β”€ openenv.yaml                  # OpenEnv spec (4 tasks)
β”œβ”€β”€ Dockerfile                    # HF Spaces (openenv-base)
β”œβ”€β”€ server/
β”‚   β”œβ”€β”€ app.py                    # FastAPI + interactive dashboard + /grade
β”‚   β”œβ”€β”€ glucorl_environment.py    # Core: reset/step/state with all 8 enhancements
β”‚   β”œβ”€β”€ patient_manager.py        # simglucose wrapper + CGM noise + exercise
β”‚   β”œβ”€β”€ reward_calculator.py      # Shaped reward + recovery bonus
β”‚   β”œβ”€β”€ graders.py                # 4 task graders + grade_detailed()
β”‚   β”œβ”€β”€ pid_controller.py         # PID baseline with anti-windup
β”‚   └── constants.py              # Thresholds, PK/PD, meals, exercise, illness
└── tests/                        # 120 tests (environment, graders, reward)

Acknowledgements

  • simglucose β€” FDA-accepted UVa/Padova T1D simulator by Jinyu Xie
  • OpenEnv β€” Open environment specification by Meta PyTorch
  • UVa/Padova Model β€” Kovatchev et al., Journal of Diabetes Science and Technology, 2009
  • Insulin PK/PD β€” Gamma-CDF absorption model based on Hovorka et al., 2004

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors