Skip to content

aphrc-dswb/dswb-platform-data-harmonization-wg

Repository files navigation

Platforms and Data Harmonization Working Group - Data Science Without Borders

The Data Science Without Borders project hosts an the Platforms and Data Harmonization Working Group for researchers in African countries.

Say hi on Discord || Join our next meeting

Background

Data Science Without Borders (DSWB) is an international initiative, funded by the Wellcome Trust and led by the African Population and Health Research Center (APHRC). This project has three overarching objectives: to strengthen data systems in Pathfinder countries, to create a sustainable environment for collaborative AI/ML platforms, and to create a user-friendly platform for AI and Machine Learning (AI/ML) tools. The impact of inequality in data availability and access is evident, particularly in resource-limited settings like many African nations. This African institution-led initiative will leverage artificial intelligence and machine learning (AI/ML) to bridge existing gaps in data accessibility, infrastructure, and expertise. It aims to foster a collaborative environment that empowers African nations to harness the full potential of AI/ML for improving health outcomes.

We would be working with three Pathfinder Institutions:

  • the Armauer Hansen Research Institute (AHRI) in Ethiopia 🇪🇹
  • the Institute for Health Research, Epidemiological Surveillance (IRESSEF) in Senegal 🇸🇳
  • and the Training and Douala General Hospital (DGH) in Cameroon 🇨🇲

The London School of Hygiene & Tropical Medicine (LSHTM) and the Committee on Data of the International Science Council (CODATA) collaborate as delivery partners providing support in platform development. The Africa CDC provides key technical oversight for the project, supporting country engagement, identification of priority use cases and provision of guidance for effective policy engagement.

Our Goals

Overall goal

The overall goal of the Platforms and Data Harmonization Working Group is to establish a robust, integrated digital platform that standardizes, harmonizes, and facilitates access to health and demographic data across the three pathfinder countries. This will support interoperable, FAIR data practices.

Specific goals

  1. Data requirement gathering
    Develop and implement data and skills mapping tools to collect comprehensive information from pathfinder countries, which will define the platform's configuration and functionality requirements.

  2. Data harmonization and standardization
    Standardize health data across countries using the OMOP Common Data Model (CDM) to facilitate consistent data structure and support comparative analyses and interoperability.

  3. Training and capacity building
    Conduct targeted training programs to build pathfinders' capacities in data standardization, management, and use of OMOP CDM and FAIR data practices, ensuring sustainable, long-term data competencies.

  4. Metadata management and FAIR principles
    Develop and enhance metadata authoring tools and ensure data management processes align with FAIR principles, enabling findable, accessible, interoperable, and reusable data.

  5. No-code/low-code platform enhancement
    Upgrade the I-DAIR CODEX platform to support no-code and low-code functionalities for data integration, enabling pathfinder countries to process data and generate insights with minimal technical expertise.

  6. Computational infrastructure setup
    Identify and address the computational hardware and software requirements needed to support the platform, ensuring the necessary resources are in place for optimal performance and scalability.

Primary deliverables

  1. Pathfinder data mapping tool (D.1.1)

    • A survey tool developed to collect information on available datasets, research questions, and objectives from pathfinder countries to inform analysis and collaboration opportunities.
  2. Data mapping analysis report (D.1.2)

    • A comprehensive report analyzing the survey data, identifying synergies and priority research areas across pathfinder countries to guide collaborative data and research efforts.
  3. Metadata authoring tool (D.1.4)

    • A platform or tool configured to support standardized metadata creation and management, ensuring that datasets are well-documented and interoperable.
  4. OMOP Common Data Model (CDM) training modules and resources

    • Training materials and sessions covering OMOP CDM basics, data transformation, data quality assessment, ETL pipeline implementation, and analytical techniques.
  5. Transformed datasets in OMOP CDM format (D.2.3)

    • Processed and transformed datasets from pathfinder countries, standardized to the OMOP CDM structure to facilitate interoperability.
  6. Data quality reports (D.2.4)

    • Reports generated using data quality assessment tools like ACHILLES, documenting data quality improvements and issues addressed within the standardized datasets.
  7. FAIR implementation plan (D.4.2)

    • A structured plan detailing strategies for incorporating FAIR principles within platform metadata and data management, including specific standards, tools, and practices.
  8. Functional metadata catalog (D.4.3)

    • A searchable metadata catalog for organizing and accessing pathfinder datasets, aligned with FAIR principles to enhance data discoverability and reuse.
  9. AP-HRC CODEX platform deployment and architecture (D.5.3 & D.5.8)

    • High-level architecture and deployed instances of the I-DAIR CODEX platform, enhanced with low-code functionalities, advanced analytics, and OMOP CDM integration capabilities.
  10. OMOP CDM and OHDSI pipeline integration (D.5.5)

    • Development and deployment of endpoints and connectors that allow seamless integration between the AP-HRC CODEX platform, OMOP CDM, and OHDSI tools for enhanced data analysis.
  11. Training and capacity-building reports (D.2.8, D.5.5)

    • Documentation on participant selection, training schedules, evaluations, and user feedback on training sessions, including certification for completion.
  12. Data warehouse schema and ETL pipeline (D.3.2, D.3.3a, D.3.3b)

    • A dynamic data warehouse schema and ETL pipelines designed to manage, transform, and integrate datasets for pathfinder institutions, both in staging and final OMOP CDM formats.
  13. User manuals and training materials for AP-HRC CODEX platform (D.5.8)

    • Comprehensive guides for end-users on how to install, use, and troubleshoot the CODEX platform, including specific workflows and functionalities.

Contributing

  • Guidelines: Contribution Guidelines for contributors will be developed in due course
  • Code of Conduct: Code of Conduct will be adopted in agreement with the DSWB members

Maintainers

This repository has been set up and maintained by Steve Cygu (@cygubicko) and Jay Greenfield (@jaygee-on-github) to support the work of WG under DSWB.

Please create an issue to share references or ideas related to the development of this project.

♻️ License

This work is licensed under the MIT license (code) and Creative Commons Attribution 4.0 International license (for documentation). You are free to share and adapt the material for any purpose, even commercially, as long as you provide attribution (give appropriate credit, provide a link to the license, and indicate if changes were made) in any reasonable manner, but not in any way that suggests the licensor endorses you or your use and with no additional restrictions.

🤝 Acknowledgement

This repository uses the template created by Malvika and members of The Turing Way team, shared under CC-BY 4.0 for reuse: https://github.com/the-turing-way/reproducible-project-template.

About

No description, website, or topics provided.

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages