Skip to content

ISARICResearch/ISARIC-data-schema

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

ISARIC Data Schema

This repository contains schema definitions used in the ISARIC clinical data schema. The schema provides a structured target model that supports standardisation, validation, and downstream analytics across ISARIC studies.


Context

ISARIC data workflows typically follow the pipeline:

ARC → REDCap → Data Transformation → Analytics / Vertex

  • ARC defines the standardised clinical variables and metadata used in ISARIC research.
  • REDCap is used by study sites for clinical data collection.
  • Transformation processes map REDCap exports into a consistent schema.
  • The resulting datasets are used for statistical analysis, reporting, and integration into analytics platforms such as Vertex.

This repository defines the schema used as the target structure during this transformation step.


Purpose of the Schema

The ISARIC Data Schema is designed to:

  • Harmonise datasets collected across multiple studies and diseases
  • Provide a stable structure for commonly available patient-level variables
  • Support flexible representation of sparse or repeated clinical data
  • Enable efficient production of wide, analysis-ready datasets
  • Facilitate scalable data engineering workflows

Repository Contents

This repository includes:

  • JSON schema definitions describing the core dataset structure
  • JSON schema definitions describing extended long-format data structures

These files are used by data pipelines to:

  • validate incoming data
  • standardise variable naming and organisation
  • structure event-level and attribute-level clinical information
  • support automated dataset generation for analytics environments

Role in the Data Pipeline

During processing:

  1. Clinical data are collected using ARC-aligned REDCap forms
  2. Data exports are ingested into transformation workflows
  3. Variables are mapped into the ISARIC Data Schema defined here
  4. Long-format clinical data may be reshaped into wide datasets
  5. Resulting datasets are made available for analysis and platform integration

Notes

This repository defines schema structure only.
It does not contain clinical datasets.

For ARC variable definitions see:
https://github.com/ISARICResearch/ARC

About

This repository contains the ISARIC clinical data schema. The schema is designed to support flexible, extensible clinical research datasets using an entity-attribute-value (EAV) data model.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors