Therapy rules flagging wrong fields as required#189
Conversation
…ntains
Adds fix_multivalued_member_constraints() post-processor that walks
generated JSON Schema rule preconditions and rewrites empty constraint
fragments on multivalued slots into JSON Schema's `contains` keyword
using has_member content from the source LinkML schema.
LinkML's JsonSchemaGenerator processes rule preconditions with
omit_type=True, which skips the array-aware path that would emit
`contains`. Without this fix, has_member constraints on multivalued
slots compile to empty {} fragments, causing if-clauses to vacuously
pass and then-clauses to fire unconditionally.
Related: HTAN-859
Adds five unit tests for fix_multivalued_member_constraints covering has_member with equals_string, has_member with equals_string_in, any_of of has_member branches, scalar slots (no-op), and missing class_name (skip). Related: HTAN-859
Replaces pattern: ".*X.*" workaround in Therapy rule preconditions with LinkML's has_member + equals_string / equals_string_in. The old pattern construct silently passed against array values (pattern is string-only), making conditional rules fire unconditionally for every record regardless of treatment type. Resolves HTAN-859: surgical-only Therapy records were being flagged for THERAPEUTIC_AGENTS, NUMBER_OF_CYCLES, REGIMEN_OR_LINE_OF_THERAPY.
There was a problem hiding this comment.
HTAN Schema Review — ✅ Approved
Automated review by htan-claude · commit
fcf867f3be576d5972166e562424db1d2955bde5
Solid work on this PR — the core fix is correct and well-tested, with just a couple of warnings about radiation enum coverage that need curator confirmation before merge.
Warning
2 warnings to address (no blocking issues)
Files Changed
modules/Clinical/domains/therapy.yamlscripts/linkml_to_flat_synapse_jsonschema.pytests/test_linkml_schema_conversion.py
Checklist Results
| Check | Result | Notes |
|---|---|---|
| Inheritance correctness | PASS | Therapy class inheritance unchanged; no new classes introduced |
| Inlining of nested objects | N/A | No new slots with class ranges introduced |
| Slot completeness (range, title, description) | PASS | No new slots introduced; existing slots unchanged |
| Enum integrity (alphabetical, descriptions) | PASS | No new enums; inline equals_string_in values verified against TreatmentTypeEnum |
| Generated artifacts | N/A | Generated files produced in a separate downstream PR per repo convention |
Findings
Warnings
-
modules/Clinical/domains/therapy.yaml— Radiation ruleequals_string_inlist omits several radiation-type enum values (structural + biological + coverage)
The newequals_string_inlist for Rule 1 (anatomic site required) enumerates 8 values but omits"Brachytherapy","High-Dose Rate Brachytherapy","Low-Dose Rate Brachytherapy","Conventional Radiotherapy","Stereotactic Radiosurgery", and"Radiopharmaceutical"— all present inTreatmentTypeEnumand clinically requiring a directed anatomic site. The oldpattern: ".*Radiation.*"regex also missed most of these (e.g."Brachytherapy","Stereotactic Radiosurgery"), so some gaps pre-existed, but the explicit enumeration now makes the exclusion look intentional. Similarly,"Concurrent Chemoradiation"involves a radiation component yet is absent from Rule 1, meaning records with that treatment type will not be required to supplyTHERAPY_ANATOMIC_SITE_UBERON_CODE. The PR description acknowledges the radiation list is "worth a glance" — please confirm the enumeration matches curator intent before merge, and consider adding a test that asserts whichTREATMENT_TYPEvalues trigger the anatomic-site rule to prevent future silent drift. -
tests/test_linkml_schema_conversion.py— No test for rule-count / if-then-block count mismatch guard (coverage)
fix_multivalued_member_constraints()contains an explicit defensive branch that prints a warning and returns the schema unchanged whenlen(rules) != len(if_then_blocks). None of the 5 new tests exercise this path. Since this guard protects against silent partial rewrites, it should have a test that supplies mismatched counts and asserts the schema is returned unmodified.
Informational
-
modules/Clinical/domains/therapy.yaml— Schema version bump may be warranted (coverage)
The rule precondition rewrite is a behavioural breaking change: records that previously passed spuriously will now correctly fail. Iftherapy.yamlcarries aversion:field, incrementing it would help downstream consumers track the change. No version field is visible in the provided content, so this is noted for awareness only — no action required if versioning is managed elsewhere.
Verdict
APPROVE
Rules defined in CLAUDE.md · To update review rules, edit CLAUDE.md and open a PR to main
Auto-generated Python classes from LinkML schema updates. **Auto-generated by GitHub Actions workflow** [skip ci]
|
✅ Python classes auto-updated! modules/Clinical/src/htan_clinical/datamodel/clinical.py |
Problem
Therapy records were being flagged for THERAPEUTIC_AGENTS, NUMBER_OF_CYCLES, and REGIMEN_OR_LINE_OF_THERAPY even when treatment was Surgical Procedure only. The rules were firing for every record regardless of treatment type.
Why
The YAML used pattern: ".Pharmacotherapy." against TREATMENT_TYPE, which is a multivalued (array) field. JSON Schema's pattern keyword is string-only — when applied to an array, it vacuously passes. So the if clause was always satisfied and the then requirements always fired.
Fix
Two layers:
Verified
Notes for reviewers