OHDSI GIS Toolchain Functional Requirements
@kzollove @rtmill @tibbben @jaygee-on-github - I've put together tables below that try to cluster the functional requirements and associated issues of the GIS Toolchain. I think we can maybe review these to see how best to organize and what the key decision points might be?
Core Infrastructure (Gaia System)
| Component |
GitHub Issue |
Functional Requirement |
Priority |
Status |
Reference |
| Database & Schema |
#368 |
Deploy PostgreSQL with PostGIS extension for geospatial data types and operations |
High |
In Development |
Workshop transcript - Robert's presentation on Gaia schema |
| Database & Schema |
#368 |
Maintain separate schemas for backbone (standardized geospatial data), OMOP extensions, and GIS vocabularies |
High |
In Development |
Workshop transcript - Gaia DB schema discussion |
| Database & Schema |
#368 |
Use geometry and attribute table templates to handle diverse data sources while maintaining functional consistency |
Medium |
In Development |
Workshop transcript - Robert's templating architecture explanation |
| Database & Schema |
#244 |
Implement external exposure table and location history table extensions to OMOP v5.4+ |
High |
In Development |
Workshop transcript - OMOP extension discussion |
| Data Standardization |
#368 |
Support automated retrieval, ingestion, and transformation of geospatial data from various sources |
High |
In Development |
Workshop transcript - Gaia core functionality |
| Data Standardization |
#365 |
Comprehensive metadata capture including functional, administrative, and provenance information |
Medium |
In Development |
Workshop transcript - Tim's catalog metadata discussion |
| Data Standardization |
#368 |
Maintain unique identifiers throughout the system for data lineage and catalog integration |
Medium |
In Development |
Workshop transcript - UUID discussion |
| Data Standardization |
- |
Handle various input formats (shapefiles, CSV, GeoJSON, raster data) |
Medium |
Partially Complete |
Workshop transcript - Multiple format support mentioned |
Geocoding and Address Processing
| Component |
GitHub Issue |
Functional Requirement |
Priority |
Status |
Reference |
| Privacy-Preserving |
#371 |
Support for on-premises geocoding using TIGER/US Census data, Nominatim (OpenStreetMap), and DGeoLite |
High |
In Development |
Workshop transcript - Tim's geocoding tools comparison |
| Privacy-Preserving |
- |
Dockerized geocoding services isolated from main CDM infrastructure |
High |
Complete |
Workshop transcript - Containerized geocoding discussion |
| Privacy-Preserving |
#371 |
Process addresses separately from patient identifiers with secure handling protocols |
High |
Design Phase |
Workshop transcript - Privacy concerns discussion |
| Privacy-Preserving |
- |
Support address-level, city-level, and administrative boundary geocoding |
Medium |
In Development |
Workshop transcript - Multi-scale geocoding |
| Data Quality |
- |
Standardize address formats, abbreviations, and common naming conventions |
Medium |
Not Started |
Workshop transcript - Address cleaning requirements |
| Data Quality |
- |
Return match confidence scores and accuracy thresholds |
Medium |
Partially Complete |
Workshop transcript - Geocoding benchmarking results |
| Data Quality |
- |
Manage failed geocoding attempts and provide alternative matching strategies |
Low |
Not Started |
Workshop transcript - Error handling discussion |
| Data Quality |
- |
Support for geocoding accuracy evaluation across different tools and regions |
Low |
Complete |
Workshop transcript - Tim's benchmarking study |
Vocabulary Management
| Component |
GitHub Issue |
Functional Requirement |
Priority |
Status |
Reference |
| GIS Vocabulary |
#369 |
Implement OMOP GIS Vocabulary with 159 geographic and spatial concepts |
High |
Complete |
Workshop transcript - Paulina's vocabulary presentation |
| GIS Vocabulary |
#369 |
Implement OMOP Exposome Vocabulary for environmental and toxicological factors |
High |
Complete |
Workshop transcript - Exposome vocabulary discussion |
| GIS Vocabulary |
#369 |
Implement OMOP SDOH Vocabulary for social determinants of health |
High |
Complete |
Workshop transcript - SDOH vocabulary discussion |
| GIS Vocabulary |
#348 |
Containerized vocabulary build system from SSSOM format through GitHub Actions |
Medium |
Complete |
Workshop transcript - Jared's build pipeline |
| GIS Vocabulary |
#369 |
Seven new domains (behavioral, demographic, environmental, geographic, healthcare, phenotypic, socioeconomic features) |
High |
Complete |
Workshop transcript - Paulina's domain discussion |
| GIS Vocabulary |
#369 |
31 new concept classes for constructs, determinants, items, and item values |
Medium |
Complete |
Workshop transcript - Concept classes explanation |
| Vocabulary Maintenance |
#348 |
Git-based vocabulary management with automated deployment |
Medium |
Complete |
Workshop transcript - GitHub integration |
| Vocabulary Maintenance |
#348 |
Distribute only custom GIS vocabulary extensions while respecting OMOP licensing |
High |
Complete |
Workshop transcript - Delta vocabulary distribution |
| Vocabulary Maintenance |
#369 |
Five new relationship types for geospatial and exposure relationships |
Medium |
Complete |
Workshop transcript - Relationship types discussion |
| Vocabulary Maintenance |
#369 |
Standard and non-standard concept designation based on OMOP mappings |
Medium |
Complete |
Workshop transcript - Concept standardness rules |
Catalog and Discovery
| Component |
GitHub Issue |
Functional Requirement |
Priority |
Status |
Reference |
| Metadata Catalog |
#365 |
FAIR Principles Implementation (Findable, Accessible, Interoperable, Reusable) |
High |
In Development |
Workshop transcript - Tim's FAIR principles discussion |
| Metadata Catalog |
#365 |
Multi-level metadata including Dublin Core, technical, administrative, and variable-level metadata |
Medium |
In Development |
Workshop transcript - Comprehensive metadata framework |
| Metadata Catalog |
#365 |
Full-text search, concept-based search, and spatial bounding box queries |
Medium |
Partially Complete |
Workshop transcript - Tim's catalog demo |
| Data Discovery |
#365 |
Browser-based catalog exploration with filtering and search capabilities |
Medium |
In Development |
Workshop transcript - Gaia Catalog web interface |
| Data Discovery |
#365 |
RESTful APIs for programmatic catalog access |
Low |
Not Started |
Inferred from service-oriented architecture |
| Data Discovery |
#365 |
Integrated mapping tools for data preview |
Low |
In Development |
Workshop transcript - Visual exploration tools |
| Data Discovery |
#365 |
Batch selection and processing of multiple datasets |
Low |
Design Phase |
Workshop transcript - Shopping cart functionality mention |
Spatial Analysis
| Component |
GitHub Issue |
Functional Requirement |
Priority |
Status |
Reference |
| Spatial Operations |
#242 |
Link patient location history with time-varying geospatial data |
High |
In Development |
Workshop transcript - Temporal-spatial joins |
| Spatial Operations |
#242 |
Point-in-polygon, buffer analysis, distance calculations |
High |
In Development |
Workshop transcript - Tim's spatial join types |
| Spatial Operations |
#242 |
Vectorized spatial operations using PostGIS capabilities |
Medium |
In Development |
Workshop transcript - PostGIS functionality |
| Spatial Operations |
- |
Support analysis from parcel-level to continental scale |
Medium |
In Development |
Workshop transcript - Multi-scale analysis |
| External Exposure |
#244 |
Track exposure values across time periods for longitudinal analysis |
High |
In Development |
Workshop transcript - External exposure table |
| External Exposure |
#245 |
Capture geometric relationships (within, intersects, buffer distance) |
Medium |
In Development |
Workshop transcript - Relationship concepts |
| External Exposure |
#244 |
Link quantitative and qualitative exposure measurements to persons and locations |
High |
In Development |
Workshop transcript - Value attribution |
| External Exposure |
#348 |
Support standard unit concepts from OMOP vocabulary |
Medium |
Complete |
Workshop transcript - Unit standardization |
Analytics Integration
| Component |
GitHub Issue |
Functional Requirement |
Priority |
Status |
Reference |
| OMOP Analytics |
#365 |
Custom covariate functions for Patient-Level Prediction models |
High |
Prototype Complete |
Workshop transcript - Jared's PLP integration |
| OMOP Analytics |
#365 |
Future integration with ATLAS cohort definition tools |
Medium |
Not Started |
Workshop transcript - Atlas integration roadmap |
| OMOP Analytics |
- |
Enable standardized GIS analytics across distributed networks |
High |
Design Phase |
Workshop transcript - Federated analysis goals |
| OMOP Analytics |
- |
Integration with existing OMOP data quality and validation frameworks |
Medium |
Not Started |
Workshop transcript - OHDSI ecosystem integration |
| Custom Analytics |
- |
Support for GIS-specific feature creation in predictive models |
Medium |
Prototype Complete |
Workshop transcript - Custom feature functions |
| Custom Analytics |
- |
Combine clinical, environmental, and social determinant data |
High |
Prototype Complete |
Workshop transcript - Multi-variate analysis demo |
| Custom Analytics |
- |
Cross-validation and performance metrics for GIS-enhanced models |
Medium |
Prototype Complete |
Workshop transcript - Model validation |
| Custom Analytics |
- |
Generate shareable analysis packages for federated validation |
Low |
Not Started |
Workshop transcript - Study package generation |
Deployment and Operations
| Component |
GitHub Issue |
Functional Requirement |
Priority |
Status |
Reference |
| Containerization |
- |
Six-Container Architecture (Catalog, Solar, KG, DGeo, DB, Core) |
High |
In Development |
Workshop transcript - Jared's container architecture |
| Containerization |
- |
Kubernetes deployment with Helm charts |
Medium |
In Development |
Workshop transcript - Tim's cloud native approach |
| Containerization |
- |
Configurable resource allocation and scaling |
Low |
Not Started |
Inferred from containerization requirements |
| Security & Privacy |
- |
Deploy GIS infrastructure separate from main CDM with controlled connections |
High |
Design Phase |
Workshop transcript - Security isolation discussion |
| Security & Privacy |
- |
Ensure all components meet healthcare data security requirements |
High |
Design Phase |
Workshop transcript - HIPAA compliance discussion |
| Security & Privacy |
- |
Comprehensive logging for data access and processing activities |
Medium |
Not Started |
Inferred from security requirements |
| Security & Privacy |
- |
Role-based access to different system components and data levels |
Medium |
Not Started |
Inferred from multi-user system needs |
Data Quality
| Component |
GitHub Issue |
Functional Requirement |
Priority |
Status |
Reference |
| Quality Assurance |
- |
Automated checks for spatial data integrity and temporal consistency |
Medium |
Not Started |
Inferred from data quality needs |
| Quality Assurance |
#368 |
Complete audit trail from raw data to final exposure assignments |
Medium |
Partially Complete |
Workshop transcript - Provenance tracking |
| Quality Assurance |
- |
Systematic capture and reporting of processing errors and data quality issues |
Medium |
Not Started |
Inferred from quality requirements |
| Quality Assurance |
- |
Reference datasets for validation and testing |
Low |
Partially Complete |
Workshop transcript - Synthetic data for testing |
| Performance |
- |
Track data ingestion, geocoding, and spatial join performance |
Low |
Not Started |
Inferred from operational needs |
| Performance |
#242 |
Monitor database growth and optimize spatial indexing |
Low |
Not Started |
Inferred from database performance needs |
| Performance |
- |
Track catalog usage patterns and search effectiveness |
Low |
Not Started |
Inferred from user analytics needs |
| Performance |
- |
Automated monitoring of container health and resource utilization |
Low |
Not Started |
Inferred from operational monitoring |
Extensibility
| Component |
GitHub Issue |
Functional Requirement |
Priority |
Status |
Reference |
| Modular Architecture |
#249 |
Support for additional geocoding services and data sources |
Medium |
Design Phase |
Workshop transcript - Plugin framework discussion |
| Modular Architecture |
- |
Well-defined interfaces for adding new analytical capabilities |
Medium |
In Development |
Workshop transcript - API extensibility |
| Modular Architecture |
#369 |
Framework for domain-specific vocabulary development |
Low |
Design Phase |
Workshop transcript - Community vocabulary development |
| Modular Architecture |
- |
Multi-language and multi-region capability framework |
Low |
Not Started |
Workshop transcript - International support needs |
| Research Integration |
- |
Framework for generating synthetic geospatial data for research |
Medium |
Prototype Complete |
Workshop transcript - Jared's synthetic data |
| Research Integration |
- |
Integration with research collaboration platforms |
Low |
Not Started |
Inferred from research collaboration needs |
| Research Integration |
- |
Tools for generating reproducible research outputs |
Low |
Not Started |
Workshop transcript - Publication support |
| Research Integration |
#369 |
Support for community-driven data source and vocabulary contributions |
Low |
Design Phase |
Workshop transcript - Community development goals |
Key GitHub Issues Summary
Major Roadmap Issues:
Implementation Issues:
OHDSI GIS Toolchain Functional Requirements
@kzollove @rtmill @tibbben @jaygee-on-github - I've put together tables below that try to cluster the functional requirements and associated issues of the GIS Toolchain. I think we can maybe review these to see how best to organize and what the key decision points might be?
Core Infrastructure (Gaia System)
Geocoding and Address Processing
Vocabulary Management
Catalog and Discovery
Spatial Analysis
Analytics Integration
Deployment and Operations
Data Quality
Extensibility
Key GitHub Issues Summary
Major Roadmap Issues:
Implementation Issues: