Skip to content

Task - Update Docs with Functional Requirements of GIS Toolchain #398

@jshoughtaling

Description

@jshoughtaling

OHDSI GIS Toolchain Functional Requirements

@kzollove @rtmill @tibbben @jaygee-on-github - I've put together tables below that try to cluster the functional requirements and associated issues of the GIS Toolchain. I think we can maybe review these to see how best to organize and what the key decision points might be?

Core Infrastructure (Gaia System)

Component GitHub Issue Functional Requirement Priority Status Reference
Database & Schema #368 Deploy PostgreSQL with PostGIS extension for geospatial data types and operations High In Development Workshop transcript - Robert's presentation on Gaia schema
Database & Schema #368 Maintain separate schemas for backbone (standardized geospatial data), OMOP extensions, and GIS vocabularies High In Development Workshop transcript - Gaia DB schema discussion
Database & Schema #368 Use geometry and attribute table templates to handle diverse data sources while maintaining functional consistency Medium In Development Workshop transcript - Robert's templating architecture explanation
Database & Schema #244 Implement external exposure table and location history table extensions to OMOP v5.4+ High In Development Workshop transcript - OMOP extension discussion
Data Standardization #368 Support automated retrieval, ingestion, and transformation of geospatial data from various sources High In Development Workshop transcript - Gaia core functionality
Data Standardization #365 Comprehensive metadata capture including functional, administrative, and provenance information Medium In Development Workshop transcript - Tim's catalog metadata discussion
Data Standardization #368 Maintain unique identifiers throughout the system for data lineage and catalog integration Medium In Development Workshop transcript - UUID discussion
Data Standardization - Handle various input formats (shapefiles, CSV, GeoJSON, raster data) Medium Partially Complete Workshop transcript - Multiple format support mentioned

Geocoding and Address Processing

Component GitHub Issue Functional Requirement Priority Status Reference
Privacy-Preserving #371 Support for on-premises geocoding using TIGER/US Census data, Nominatim (OpenStreetMap), and DGeoLite High In Development Workshop transcript - Tim's geocoding tools comparison
Privacy-Preserving - Dockerized geocoding services isolated from main CDM infrastructure High Complete Workshop transcript - Containerized geocoding discussion
Privacy-Preserving #371 Process addresses separately from patient identifiers with secure handling protocols High Design Phase Workshop transcript - Privacy concerns discussion
Privacy-Preserving - Support address-level, city-level, and administrative boundary geocoding Medium In Development Workshop transcript - Multi-scale geocoding
Data Quality - Standardize address formats, abbreviations, and common naming conventions Medium Not Started Workshop transcript - Address cleaning requirements
Data Quality - Return match confidence scores and accuracy thresholds Medium Partially Complete Workshop transcript - Geocoding benchmarking results
Data Quality - Manage failed geocoding attempts and provide alternative matching strategies Low Not Started Workshop transcript - Error handling discussion
Data Quality - Support for geocoding accuracy evaluation across different tools and regions Low Complete Workshop transcript - Tim's benchmarking study

Vocabulary Management

Component GitHub Issue Functional Requirement Priority Status Reference
GIS Vocabulary #369 Implement OMOP GIS Vocabulary with 159 geographic and spatial concepts High Complete Workshop transcript - Paulina's vocabulary presentation
GIS Vocabulary #369 Implement OMOP Exposome Vocabulary for environmental and toxicological factors High Complete Workshop transcript - Exposome vocabulary discussion
GIS Vocabulary #369 Implement OMOP SDOH Vocabulary for social determinants of health High Complete Workshop transcript - SDOH vocabulary discussion
GIS Vocabulary #348 Containerized vocabulary build system from SSSOM format through GitHub Actions Medium Complete Workshop transcript - Jared's build pipeline
GIS Vocabulary #369 Seven new domains (behavioral, demographic, environmental, geographic, healthcare, phenotypic, socioeconomic features) High Complete Workshop transcript - Paulina's domain discussion
GIS Vocabulary #369 31 new concept classes for constructs, determinants, items, and item values Medium Complete Workshop transcript - Concept classes explanation
Vocabulary Maintenance #348 Git-based vocabulary management with automated deployment Medium Complete Workshop transcript - GitHub integration
Vocabulary Maintenance #348 Distribute only custom GIS vocabulary extensions while respecting OMOP licensing High Complete Workshop transcript - Delta vocabulary distribution
Vocabulary Maintenance #369 Five new relationship types for geospatial and exposure relationships Medium Complete Workshop transcript - Relationship types discussion
Vocabulary Maintenance #369 Standard and non-standard concept designation based on OMOP mappings Medium Complete Workshop transcript - Concept standardness rules

Catalog and Discovery

Component GitHub Issue Functional Requirement Priority Status Reference
Metadata Catalog #365 FAIR Principles Implementation (Findable, Accessible, Interoperable, Reusable) High In Development Workshop transcript - Tim's FAIR principles discussion
Metadata Catalog #365 Multi-level metadata including Dublin Core, technical, administrative, and variable-level metadata Medium In Development Workshop transcript - Comprehensive metadata framework
Metadata Catalog #365 Full-text search, concept-based search, and spatial bounding box queries Medium Partially Complete Workshop transcript - Tim's catalog demo
Data Discovery #365 Browser-based catalog exploration with filtering and search capabilities Medium In Development Workshop transcript - Gaia Catalog web interface
Data Discovery #365 RESTful APIs for programmatic catalog access Low Not Started Inferred from service-oriented architecture
Data Discovery #365 Integrated mapping tools for data preview Low In Development Workshop transcript - Visual exploration tools
Data Discovery #365 Batch selection and processing of multiple datasets Low Design Phase Workshop transcript - Shopping cart functionality mention

Spatial Analysis

Component GitHub Issue Functional Requirement Priority Status Reference
Spatial Operations #242 Link patient location history with time-varying geospatial data High In Development Workshop transcript - Temporal-spatial joins
Spatial Operations #242 Point-in-polygon, buffer analysis, distance calculations High In Development Workshop transcript - Tim's spatial join types
Spatial Operations #242 Vectorized spatial operations using PostGIS capabilities Medium In Development Workshop transcript - PostGIS functionality
Spatial Operations - Support analysis from parcel-level to continental scale Medium In Development Workshop transcript - Multi-scale analysis
External Exposure #244 Track exposure values across time periods for longitudinal analysis High In Development Workshop transcript - External exposure table
External Exposure #245 Capture geometric relationships (within, intersects, buffer distance) Medium In Development Workshop transcript - Relationship concepts
External Exposure #244 Link quantitative and qualitative exposure measurements to persons and locations High In Development Workshop transcript - Value attribution
External Exposure #348 Support standard unit concepts from OMOP vocabulary Medium Complete Workshop transcript - Unit standardization

Analytics Integration

Component GitHub Issue Functional Requirement Priority Status Reference
OMOP Analytics #365 Custom covariate functions for Patient-Level Prediction models High Prototype Complete Workshop transcript - Jared's PLP integration
OMOP Analytics #365 Future integration with ATLAS cohort definition tools Medium Not Started Workshop transcript - Atlas integration roadmap
OMOP Analytics - Enable standardized GIS analytics across distributed networks High Design Phase Workshop transcript - Federated analysis goals
OMOP Analytics - Integration with existing OMOP data quality and validation frameworks Medium Not Started Workshop transcript - OHDSI ecosystem integration
Custom Analytics - Support for GIS-specific feature creation in predictive models Medium Prototype Complete Workshop transcript - Custom feature functions
Custom Analytics - Combine clinical, environmental, and social determinant data High Prototype Complete Workshop transcript - Multi-variate analysis demo
Custom Analytics - Cross-validation and performance metrics for GIS-enhanced models Medium Prototype Complete Workshop transcript - Model validation
Custom Analytics - Generate shareable analysis packages for federated validation Low Not Started Workshop transcript - Study package generation

Deployment and Operations

Component GitHub Issue Functional Requirement Priority Status Reference
Containerization - Six-Container Architecture (Catalog, Solar, KG, DGeo, DB, Core) High In Development Workshop transcript - Jared's container architecture
Containerization - Kubernetes deployment with Helm charts Medium In Development Workshop transcript - Tim's cloud native approach
Containerization - Configurable resource allocation and scaling Low Not Started Inferred from containerization requirements
Security & Privacy - Deploy GIS infrastructure separate from main CDM with controlled connections High Design Phase Workshop transcript - Security isolation discussion
Security & Privacy - Ensure all components meet healthcare data security requirements High Design Phase Workshop transcript - HIPAA compliance discussion
Security & Privacy - Comprehensive logging for data access and processing activities Medium Not Started Inferred from security requirements
Security & Privacy - Role-based access to different system components and data levels Medium Not Started Inferred from multi-user system needs

Data Quality

Component GitHub Issue Functional Requirement Priority Status Reference
Quality Assurance - Automated checks for spatial data integrity and temporal consistency Medium Not Started Inferred from data quality needs
Quality Assurance #368 Complete audit trail from raw data to final exposure assignments Medium Partially Complete Workshop transcript - Provenance tracking
Quality Assurance - Systematic capture and reporting of processing errors and data quality issues Medium Not Started Inferred from quality requirements
Quality Assurance - Reference datasets for validation and testing Low Partially Complete Workshop transcript - Synthetic data for testing
Performance - Track data ingestion, geocoding, and spatial join performance Low Not Started Inferred from operational needs
Performance #242 Monitor database growth and optimize spatial indexing Low Not Started Inferred from database performance needs
Performance - Track catalog usage patterns and search effectiveness Low Not Started Inferred from user analytics needs
Performance - Automated monitoring of container health and resource utilization Low Not Started Inferred from operational monitoring

Extensibility

Component GitHub Issue Functional Requirement Priority Status Reference
Modular Architecture #249 Support for additional geocoding services and data sources Medium Design Phase Workshop transcript - Plugin framework discussion
Modular Architecture - Well-defined interfaces for adding new analytical capabilities Medium In Development Workshop transcript - API extensibility
Modular Architecture #369 Framework for domain-specific vocabulary development Low Design Phase Workshop transcript - Community vocabulary development
Modular Architecture - Multi-language and multi-region capability framework Low Not Started Workshop transcript - International support needs
Research Integration - Framework for generating synthetic geospatial data for research Medium Prototype Complete Workshop transcript - Jared's synthetic data
Research Integration - Integration with research collaboration platforms Low Not Started Inferred from research collaboration needs
Research Integration - Tools for generating reproducible research outputs Low Not Started Workshop transcript - Publication support
Research Integration #369 Support for community-driven data source and vocabulary contributions Low Design Phase Workshop transcript - Community development goals

Key GitHub Issues Summary

Major Roadmap Issues:

Implementation Issues:

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationRequires (re) writing of documentation, no coding.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status
    ✔Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions