Skip to content

Common-Longitudinal-ICU-data-Format/FLAIR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

21 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ₯ FLAIR - Federated Learning Assessment for ICU Research (WIP 🚧)

License Python CLIF

A privacy-first benchmark framework for evaluating ML/AI methods on ICU prediction tasks across 17+ hospital sites using the CLIF (Common Longitudinal ICU Format) data standard.

In the ICU, clinical decisions happen in real-time and directly impact patient survival. Rich dataβ€”diagnoses, laboratory values, vital signs, and outcomesβ€”can drive better care, and AI offers new ways to support clinical reasoning in these high-stakes environments. Yet building trustworthy, generalizable AI requires diverse datasets that reflect varied patient populations and clinical practices.

Public ICU datasets like MIMIC and eICU have enabled significant research progress, but they represent only a handful of institutions. Meanwhile, the vast majority of ICU data remains locked in private hospital silosβ€”inaccessible for multi-site validation due to privacy regulations. This fragmentation limits our ability to develop AI that generalizes beyond the institutions where it was trained.

CLIF and FLAIR bridge this gap. The Common Longitudinal ICU Format (CLIF) provides a shared data standard that harmonizes ICU data across institutions. FLAIR builds on this foundation, enabling federated model evaluation on real-world private data from 17+ hospitalsβ€”without patient information ever leaving each site.


🎯 Why FLAIR?

The Problem: Public Benchmarks Aren't Enough

Existing ICU benchmarks like YAIB and HiRID-ICU-Benchmark have done excellent work harmonizing public datasets (MIMIC, eICU, HiRID, AUMCdb). However, models validated solely on public data face a critical limitation:

Models that perform well on public benchmarks may not generalize to real-world clinical settings.

Public datasets represent a handful of institutions with specific patient populations, workflows, and documentation practices. Real-world deployment requires validation across diverse hospitals.

The Challenge: Patient Data Cannot Leave Hospitals

Multi-site validation is essential, but patient data is protected by strict privacy regulations (HIPAA, IRB). Raw clinical data cannot be shared or centralized for traditional benchmark evaluation.

The Solution: Federated Evaluation

FLAIR enables researchers to validate their methods on private ICU datasets from 17+ US hospitals without the data ever leaving each site:

FLAIR Workflow

Result: Your model gets evaluated on real-world private clinical data from diverse institutions, enabling robust generalization assessment.


πŸ›οΈ The CLIF Consortium

FLAIR is built on the Common Longitudinal ICU Format (CLIF) data standard, maintained by a consortium of 17+ academic medical centers across the United States.

Metric Value
Participating Sites 17+
Geographic Coverage Nationwide (US)
Combined ICU Beds 2,000+
Data Standard CLIF v2.1

The consortium includes major academic medical centers, community hospitals, and health systemsβ€”providing diverse patient populations, clinical workflows, and documentation practices for robust model validation.

πŸšͺ MIMIC-CLIF: Your Entry Point

Since CLIF consortium data is private, how do you develop your method? MIMIC-CLIF is the answer:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€-┐
β”‚   MIMIC-CLIF    β”‚         β”‚  CLIF Consortium β”‚
β”‚   (Public)      β”‚         β”‚  (Private)       β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€         β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€-─
β”‚ β€’ PhysioNet     β”‚   Same  β”‚ β€’ 17+ Sites      β”‚
β”‚ β€’ ~70k ICU staysβ”‚ ══════► β”‚ β€’ 500k+ ICU staysβ”‚
β”‚ β€’ Single site   β”‚  Schema β”‚ β€’ Diverse pops   β”‚
β”‚ β€’ Development   β”‚         β”‚ β€’ Evaluation     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜         └─────────────────-β”˜

This approach ensures your code will work on consortium data without modification.


πŸ“Š Benchmark Tasks

FLAIR provides 7 clinically relevant prediction tasks:

Binary Classification Tasks

Task Name Description Cohort Filter
1 Discharged Home Predict if patient will be discharged directly home All ICU patients
2 Discharged to LTACH Predict if patient will go to long-term acute care All ICU patients
6 Hospital Mortality Predict in-hospital death (first 24hr ICU data) 1st ICU stay β‰₯ 24hr
7 Unplanned ICU Readmission Predict unplanned return to ICU (entire 1st ICU stay data) 1st ICU stay β‰₯ 24hr

Multiclass Classification Tasks

Task Name Description Cohort Filter
3 72-Hour Respiratory Outcome Predict ventilator status at 72hr (on/off/expired) IMV at 24hr only

Regression Tasks

Task Name Description Cohort Filter
4 Hypoxic Proportion Predict fraction of hypoxic hours (24-72hr window) IMV at 24hr only
5 ICU Length of Stay Predict 1st ICU stay duration (first 24hr data) 1st ICU stay β‰₯ 24hr

Understanding Time Windows

Each task defines a time window for data extraction:

  • window_start: Beginning of the data collection period
  • window_end: The prediction time - this is when the model makes its prediction

Critical: You can use ALL data points within the window (window_start to window_end), but you CANNOT use any data after window_end. Using data beyond the prediction time would be data leakage.

The window definition varies by task:

  • Tasks 1-6: Window is first 24 hours from ICU admission (first_icu_start_time to +24hr)
  • Task 7: Window is the entire first ICU stay (first_icu_start_time to first_icu_end_time)

The window is task-specific and not always aligned with ICU admission/discharge times.

Community-Driven: These tasks are driven by community needs. Have a cool prediction task idea? Open a PR or Issue! We're actively working on adding more tasks.

Note: Each task has its own cohort size (N) based on task-specific filters. All tasks share the same base criteria: hospitalizations with at least 1 ICU stay.


🎁 What FLAIR Provides

FLAIR is a Python library that generates task-specific datasets for ICU prediction benchmarks:

FLAIR Provides You Provide
βœ… Task-specific cohort filtering πŸ”§ Your feature engineering
βœ… Consistent label extraction πŸ”§ Your model architecture
βœ… Temporal train/test splits πŸ”§ Your training pipeline
βœ… Demographics & time windows πŸ”§ Your evaluation

Output Format

Each task outputs a single parquet file with all required columns:

Column Type Description
hospitalization_id str Unique identifier
admission_dttm datetime Hospital admission time
discharge_dttm datetime Hospital discharge time
window_start datetime ICU start time (input window start)
window_end datetime Prediction time (+24hr from ICU start)
{task_label} int/float Task-specific label
split str "train" or "test"
age_at_admission int Patient age
sex_category str Patient sex
race_category str Patient race
ethnicity_category str Patient ethnicity

πŸ’Ώ Installation

# Install with pip
pip install flair-benchmark

# Or from source
git clone https://github.com/clif-consortium/FLAIR.git
cd FLAIR
pip install -e .

Requirements: Python 3.10+, clifpy


πŸš€ Quick Start

1. Configure CLIF Data Source

cp clif_config_template.json clif_config.json

Edit clif_config.json to set your data path and timezone.

2. Generate Task Dataset

from flair_benchmark import generate_task_dataset, TASK_REGISTRY

# View available tasks
print(TASK_REGISTRY.keys())
# ['task1_discharged_home', 'task2_discharged_ltach', 'task3_outcome_72hr',
#  'task4_hypoxic_proportion', 'task5_icu_los', 'task6_hospital_mortality',
#  'task7_icu_readmission']

# Generate dataset for ICU LOS task with temporal split
df = generate_task_dataset(
    config_path="clif_config.json",
    task_name="task5_icu_los",
    train_start="2020-01-01",
    train_end="2022-12-31",
    test_start="2023-01-01",
    test_end="2023-12-31",
    output_path="task5_icu_los.parquet"
)

print(f"Total N: {len(df)}")
print(f"Train: {len(df.filter(df['split'] == 'train'))}")
print(f"Test: {len(df.filter(df['split'] == 'test'))}")

3. Use the Dataset

import polars as pl

# Load dataset
df = pl.read_parquet("task5_icu_los.parquet")

# Split into train/test
train = df.filter(pl.col("split") == "train")
test = df.filter(pl.col("split") == "test")

# Access labels
y_train = train["icu_los_hours"]
y_test = test["icu_los_hours"]

# Access demographics for subgroup analysis
demographics = train.select(["age_at_admission", "sex_category", "race_category"])

πŸ”’ Privacy Policy

╔════════════════════════════════════════════════════════════════╗
β•‘                     FLAIR PRIVACY POLICY                       β•‘
╠════════════════════════════════════════════════════════════════╣
β•‘                                                                β•‘
β•‘  1. NO NETWORK REQUESTS                                        β•‘
β•‘     β€’ All network access blocked at Python socket level        β•‘
β•‘     β€’ Packages like requests, urllib3, httpx are banned        β•‘
β•‘     β€’ Violation = immediate submission rejection               β•‘
β•‘                                                                β•‘
β•‘  2. PHI PROTECTION                                             β•‘
β•‘     β€’ All outputs scanned for PHI patterns                     β•‘
β•‘     β€’ Cell counts < 10 are suppressed (HIPAA safe harbor)      β•‘
β•‘     β€’ Individual-level data never leaves the site              β•‘
β•‘                                                                β•‘
β•‘  3. REVIEW PROCESS                                             β•‘
β•‘     β€’ PIs at each site review code before execution            β•‘
β•‘     β€’ PIs have final say β€” they are not required to run        β•‘
β•‘     β€’ Code inspection for data exfiltration attempts           β•‘
β•‘                                                                β•‘
β•‘  4. CONSEQUENCES                                               β•‘
β•‘     β€’ If data exfiltration found during review:                β•‘
β•‘       β†’ Submitter BANNED from FLAIR                            β•‘
β•‘       β†’ Incident reported to submitter's institution           β•‘
β•‘                                                                β•‘
β•‘  PIs are doing you a favor by running your code on their data. β•‘
β•‘  Respect their trust and protect patient privacy.              β•‘
β•‘                                                                β•‘
β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•

πŸ—οΈ Architecture

FLAIR/
β”œβ”€β”€ flair/                      # Main package
β”‚   β”œβ”€β”€ __init__.py             # Main API: generate_task_dataset()
β”‚   β”œβ”€β”€ cohort/                 # Cohort builder (clifpy integration)
β”‚   β”œβ”€β”€ config/                 # Configuration management
β”‚   β”œβ”€β”€ datasets/               # Dataset builder
β”‚   β”œβ”€β”€ helpers/                # table1, metrics, tripod_ai
β”‚   └── tasks/                  # Task definitions (7 tasks)
β”‚       β”œβ”€β”€ base.py             # BaseTask with build_task_dataset()
β”‚       β”œβ”€β”€ task1_discharged_home.py
β”‚       β”œβ”€β”€ task2_discharged_ltach.py
β”‚       β”œβ”€β”€ task3_outcome_72hr.py
β”‚       β”œβ”€β”€ task4_hypoxic_proportion.py
β”‚       β”œβ”€β”€ task5_icu_los.py
β”‚       β”œβ”€β”€ task6_hospital_mortality.py
β”‚       └── task7_icu_readmission.py
β”œβ”€β”€ clif_config_template.json   # CLIF data configuration template
β”œβ”€β”€ pyproject.toml              # Package configuration
└── tests/                      # Test suite

πŸ”— Related Projects

Project Description
clifpy Python library for CLIF data manipulation
MIMIC-CLIF CLIF-formatted MIMIC-IV (development entry point)
CLIF Consortium Official CLIF consortium website

πŸ“– Citation

If you use FLAIR in your research, please cite:

@software{flair2024,
  title = {FLAIR: Federated Learning Assessment for ICU Research},
  author = {CLIF Consortium},
  year = {2024},
  url = {https://github.com/clif-consortium/FLAIR}
}

πŸ“œ License

This source code is released under the APACHE 2.0 license. See LICENSE for details.

We do not own any of the clinical datasets used with this benchmark. Access to CLIF consortium data requires approval from each participating institution.


πŸ“¬ Contact

About

Federated Learning Assessment for ICU Research

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages