-
Notifications
You must be signed in to change notification settings - Fork 1
Closed
Labels
Description
Problem
Blood test results from GetTested (and other labs) are PDF-only. No structured data.
Proposed Solution
- PDF parser using pdfplumber (better than tabula for structured tables)
- Extract: biomarker name, value, unit, reference range, date
- Normalize biomarker names to standard vocabulary
- Output: parquet files in
blood_tests/raw/with date partitioning - Silver merge:
merge_blood_test_results.sql - Support multiple lab formats (GetTested, Sundhed.dk)
Acceptance Criteria
- Parse GetTested PDF to structured parquet
- Biomarker names normalized
- Reference ranges extracted
- Silver table queryable with historical trends
Reactions are currently unavailable