Skip to content

Add Shape expressions validation and quality reporting to GitHub Actions #52

@marvinm2

Description

@marvinm2

Extend the existing weekly GitHub Actions pipeline to include comprehensive data quality validation and reporting.

Implementation Plan

Shape Expressions Validation

  • Add ShEx/SHACL validation to the weekly data generation pipeline (Saturday 08:00 UTC)
  • Validate RDF structure, data completeness, and consistency
  • Check reference integrity and identifier resolution

Quality Metrics to Generate

  • Coverage metrics: AOPs, Key Events, Key Event Relationships, Chemical Stressors
  • Data completeness: Missing properties, empty values, required fields
  • Consistency checks: Duplicate entities, conflicting data, invalid references
  • Data freshness: Last update timestamps, source data age
  • Reference integrity: Broken external links, missing identifiers (HGNC, ChEBI, etc.)

Output Format

  • Generate quality-report.json alongside existing RDF files
  • Include summary statistics, detailed validation results, and trend data
  • Export as static files for consumption by SNORQL interface

Integration Points

  • Leverage existing data pipeline infrastructure
  • Add quality gates to prevent deployment of poor-quality data
  • Coordinate with aopwiki-snorql-extended repo for quality dashboard display

Cross-Repository Coordination

This issue coordinates with the SNORQL interface repository to create a comprehensive data quality solution. The SNORQL side will display the quality reports generated here.

Related: Will create corresponding issue in aopwiki-snorql-extended for the display interface.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions