Release 8 fixes#167
Conversation
…ional on TREATMENT_TYPE - Set PHARMACOTHERAPY_TYPE and THERAPEUTIC_AGENTS to required: false in the Therapy class body and slot_usage so they are absent from the top-level JSON Schema required array - Add both to the pharmacotherapy rule postconditions so they become required only when TREATMENT_TYPE includes "Pharmacotherapy" - Fix linkml_to_flat_synapse_jsonschema.py: add fix_multivalued_conditions() post-processor that rewrites const/pattern checks on array-typed properties to use JSON Schema `contains`, so conditionals on multivalued slots (TREATMENT_TYPE) fire correctly against array values - Broaden array_props detection to include $defs so the fix works when generating for a parent class (e.g. ClinicalData) - Add YAML comment documenting the LinkML equals_string/multivalued limitation and the downstream workaround Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…enum (fixes #152) - Add AJCCTumorStagingAvailableEnum (Yes/No/Unknown) and required AJCC_TUMOR_STAGING_AVAILABLE slot to Diagnosis - Make AJCC_STAGING_SYSTEM_EDITION optional (conditional on AJCC_TUMOR_STAGING_AVAILABLE = Yes) - Add "Not Reported" to ECOGPerformanceStatusEnum - Update test data to include new required field Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
HTAN Schema Review — ✅ Approved
Automated review by htan-claude · commit
d68159a92c31f7df6154a31ce1875d0edf3d8566
Solid work on this PR — there are a few meaningful issues to address before merge, including one blocking concern about the equals_string predicate on a multivalued slot that leaves the source YAML semantically incorrect for native LinkML validators.
Caution
1 blocking issue must be resolved before merge
Files Changed
modules/Clinical/domains/diagnosis.yamlmodules/Clinical/domains/followup.yamlmodules/Clinical/domains/therapy.yamlscripts/linkml_to_flat_synapse_jsonschema.pymodules/Clinical/tests/test_conditional_requirements.yamlmodules/Clinical/tests/test_clinical_data.yaml
Checklist Results
| Check | Result | Notes |
|---|---|---|
| Inheritance correctness | PASS | No inheritance changes introduced |
| Inlining of nested objects | N/A | No new nested objects added |
| Slot completeness (range, title, description) | PASS | New slot AJCC_TUMOR_STAGING_AVAILABLE has all required fields |
| Enum integrity (alphabetical, descriptions) | PASS | All new/modified enum values are correctly ordered and have descriptions |
| Generated artifacts | N/A | Generated files excluded from diff per scope rules |
Findings
Blocking
-
modules/Clinical/domains/therapy.yaml—equals_stringpredicate on multivaluedTREATMENT_TYPEis semantically incorrect for native LinkML validation (biological)
The pharmacotherapy conditional rule usespreconditions: slot_conditions: TREATMENT_TYPE: equals_string: "Pharmacotherapy", butTREATMENT_TYPEis declaredmultivalued: true. LinkML'sequals_stringis defined for scalar values only and will never fire correctly on an array-valued slot in any validator consuming the schema directly (e.g.,linkml-validate). Thefix_multivalued_conditions()post-processor patches this for Synapse JSON Schema output, but the source YAML is left semantically broken for native LinkML tooling. The correct fix is to useany_ofwithpattern(consistent with the existing surgical/radiation precondition already in the file), e.g.,any_of: - pattern: ".*Pharmacotherapy.*", which works in both LinkML-native and JSON Schema contexts. The post-processor workaround can remain for Synapse compatibility, but the source YAML should not rely exclusively on it.
Warnings
-
modules/Clinical/domains/diagnosis.yaml—AJCC_TUMOR_STAGING_AVAILABLEmarkedrequired: truemay be too strict for many cancer types (biological)
AJCC staging is definitionally inapplicable for many cancer types (e.g., haematologic malignancies, CNS tumours), meaning this field would routinely be absent in valid datasets. The PR frames this slot as a sentinel to enable conditionality onAJCC_STAGING_SYSTEM_EDITION, so making the sentinel itself unconditionally required may force incorrect data entry. Consider either settingrequired: falseor adding a"Not Applicable"permissible value toAJCCTumorStagingAvailableEnum(ordered alphabetically between"No"and"Unknown") and confirming HTAN mandates this field universally via a data-dictionary reference. -
modules/Clinical/domains/diagnosis.yaml—AJCCTumorStagingAvailableEnummissing"Not Applicable"permissible value (biological)
The enum coversYes / No / Unknownbut provides no way to express that AJCC staging is definitionally inapplicable for a given cancer type. Neither"No"(staging was possible but not done) nor"Unknown"(unknown whether it was done) accurately captures cases like leukaemias or certain CNS tumours. Analogous HTAN enums consistently include"Not Applicable"; its absence here will force submitters to misuse the existing values. Add"Not Applicable": Determination of a value is not relevant in the current context.ordered alphabetically between"No"and"Unknown". -
modules/Clinical/domains/therapy.yaml— Missing newline at end of file (structural)
The diff shows\ No newline at end of fileafter the final line oftherapy.yaml. While most YAML parsers tolerate this, it is non-conformant with POSIX text file standards and can cause subtle issues with some tooling (diffs, concatenation, YAML streaming parsers). Add a trailing newline after the last line. -
modules/Clinical/domains/diagnosis.yaml— No invalid-value test forAJCCTumorStagingAvailableEnum, and no missing-required-field test forAJCC_TUMOR_STAGING_AVAILABLE(coverage)
Two test gaps exist for the new enum and slot: (1) no test asserts that an invalid value (e.g.,"Maybe") raises aValueError; (2) no test verifies that aDiagnosisrecord missingAJCC_TUMOR_STAGING_AVAILABLEfails validation (or assertsrequired == Truevia schema-view). Valid-value coverage exists in the updated test data, but the negative paths are unexercised. Add both assertion types to the Clinical test suite. -
modules/Clinical/domains/followup.yaml— No test for newly added"Not Reported"value inECOGPerformanceStatusEnum(coverage)
"Not Reported"was appended toECOGPerformanceStatusEnumbut no test data or assertion coveringECOG_PERFORMANCE_STATUS: "Not Reported"is visible in the diff. Add a valid-value test confirming this value is accepted by the schema. -
modules/Clinical/domains/therapy.yaml— Non-pharmacotherapyTherapyrecord negative path not exercised in tests (coverage)
BothPHARMACOTHERAPY_TYPEandTHERAPEUTIC_AGENTSwere changed fromrequired: truetorequired: false, but the test data for the surgical procedure case intest_conditional_requirements.yamlstill includes both fields. The key regression path — aTherapyrecord withTREATMENT_TYPE: ["Surgical Procedure"]and neitherPHARMACOTHERAPY_TYPEnorTHERAPEUTIC_AGENTSpresent — is not exercised. Add a test case omitting both fields for a non-pharmacotherapy treatment type to confirm the fix holds.
Informational
-
modules/Clinical/domains/therapy.yaml— Redundantrequired: falsein bothattributesandslot_usageforPHARMACOTHERAPY_TYPE/THERAPEUTIC_AGENTS(biological)
Both slots haverequired: falseset in the classattributesblock and again inslot_usage. Theslot_usageentries are still valuable for carrying contextual description strings, but explicitly repeatingrequired: falsethere implies the override is load-bearing when it isn't. Consider removingrequired: falsefrom theslot_usageentries and retaining only the descriptions, to make the class-body setting the unambiguous authoritative source. No action required, but worth cleaning up for future editor clarity. -
pyproject.toml— LinkML version pin not verifiable from this diff (coverage)
The diff does not includepyproject.toml, so the LinkML version pin cannot be confirmed. The README documentslinkml ^1.3.5, which is more than 2 minor versions behind the current stable1.8.xline. No action required from the author on this PR, but a dedicated dependency-update PR is worth tracking. -
modules/Clinical/domains/diagnosis.yaml— Schema version bump may be warranted (coverage)
This PR adds a new required slot, makes an existing slot optional, and introduces a new enum — collectively breaking changes for existingDiagnosisrecords. Ifdiagnosis.yamlcarries aversion:header field, consider a minor or major version bump to signal the compatibility boundary. Awareness only; conventions may vary by repo policy.
Verdict
REQUEST_CHANGES
Rules defined in CLAUDE.md · To update review rules, edit CLAUDE.md and open a PR to main
Auto-generated Python classes from LinkML schema updates. **Auto-generated by GitHub Actions workflow** [skip ci]
|
✅ Python classes auto-updated! modules/Clinical/src/htan_clinical/datamodel/clinical.py |
…multivalued slot equals_string on a multivalued slot is semantically undefined in LinkML and never fires in native validators (e.g. linkml-validate). Switch to any_of with pattern, consistent with the existing surgical/radiation precondition. fix_multivalued_conditions() in the Synapse post-processor already handles pattern rewrites to contains, so Synapse JSON Schema output is unaffected. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Superseded by updated review on commit e58fa18
There was a problem hiding this comment.
HTAN Schema Review — ✅ Approved
Automated review by htan-claude · commit
e58fa183e2c4f2526d4f10b25b10b9332600ba22
Solid set of fixes for Release 8 — a few warnings to address (notably test coverage gaps and a potential "Not Applicable" gap in the new enum), but nothing that's outright broken.
Warning
6 warnings to address (no blocking issues)
Files Changed
modules/Clinical/domains/diagnosis.yamlmodules/Clinical/domains/followup.yamlmodules/Clinical/domains/therapy.yamlmodules/Clinical/tests/test_clinical_data.yamlmodules/Clinical/tests/test_conditional_requirements.yamlscripts/linkml_to_flat_synapse_jsonschema.py
Checklist Results
| Check | Result | Notes |
|---|---|---|
| Inheritance correctness | PASS | No inheritance changes in scope |
| Inlining of nested objects | PASS | No inlining changes introduced |
| Slot completeness (range, title, description) | PASS | New slot AJCC_TUMOR_STAGING_AVAILABLE has all required fields; slot_usage overrides correctly omit range/title per established pattern |
| Enum integrity (alphabetical, descriptions) | PASS | AJCCTumorStagingAvailableEnum (N < U < Y) and ECOGPerformanceStatusEnum (digits before strings) both sort correctly; all new values have descriptions |
| Generated artifacts | N/A | Generated .py and .json files excluded from diff per scope rules |
Findings
Warnings
-
modules/Clinical/domains/diagnosis.yaml—AJCCTumorStagingAvailableEnummissing"Not Applicable"permissible value (biological)
The new enum provides"Yes","No", and"Unknown", but omits"Not Applicable"— a clinically meaningful sentinel for tumor types where AJCC staging is formally inapplicable (e.g., brain tumors, hematologic malignancies, many pediatric cancers). Without it, curators are forced to record"Unknown"(implies undetermined) or"No"(implies determined-to-be-unavailable), both of which are semantically incorrect for an inapplicable scenario. This also creates inconsistency with analogous sentinel enums elsewhere in the Clinical module (e.g.,ProgressionOrRecurrenceEnum), which do include"Not Applicable". Consider adding"Not Applicable"with an appropriate description before merge. -
modules/Clinical/domains/therapy.yaml— Missing trailing newline at end of file (structural)
The diff shows\ No newline at end of fileafter the finalrequired: trueline added by this PR. While YAML parsers typically handle this gracefully, it is non-conformant and can produce spurious diffs in future PRs. Since this PR modifies the last line of the file, it's in scope to fix: add a single trailing newline after the finalrequired: true. -
modules/Clinical/domains/therapy.yaml— Pharmacotherapy conditional regex may produce false positives (biological)
The preconditionpattern: ".*Pharmacotherapy.*"will substring-match any futureTreatmentTypeEnumvalue containing "Pharmacotherapy" (e.g., a hypothetical"Non-Pharmacotherapy"or"Anti-Pharmacotherapy"). The current enum has only one matching value so there's no immediate failure, but a tighter pattern —pattern: "^Pharmacotherapy$"— would be more defensive and express intent more clearly. Thefix_multivalued_conditions()post-processor correctly rewrites thecontainswrapping at build time, but the regex permissiveness remains a latent schema-level risk. -
modules/Clinical/domains/diagnosis.yaml+modules/Clinical/domains/followup.yaml— Missing automated test coverage for new slot, changed optionality, and new enum value (coverage)
Three behavioral changes introduced by this PR lack dedicated pytest assertions: (a)AJCC_TUMOR_STAGING_AVAILABLEisrequired: true— no test verifies aDiagnosisrecord missing this field fails validation or that an invalid enum value like"Maybe"raises an error; (b)AJCC_STAGING_SYSTEM_EDITIONis now optional — no test asserts aDiagnosiswithAJCC_TUMOR_STAGING_AVAILABLE: "No"and noAJCC_STAGING_SYSTEM_EDITIONloads without error; (c)"Not Reported"inECOGPerformanceStatusEnum— no test validates acceptance of that value while rejecting out-of-range inputs. The PR test plan calls these out as manual verification steps; they should be encoded as automated pytest cases inmodules/Clinical/tests/. -
modules/Clinical/domains/therapy.yaml— Missing automated test coverage for pharmacotherapy conditional rule (coverage + biological)
The core fix in this PR (movingPHARMACOTHERAPY_TYPEandTHERAPEUTIC_AGENTSinto the pharmacotherapy conditional rule) has no invalid-instance test coverage for either path: (a) aTherapyrecord withTREATMENT_TYPE: ["Pharmacotherapy"]but missingPHARMACOTHERAPY_TYPEorTHERAPEUTIC_AGENTSshould fail validation; (b) aTherapyrecord withTREATMENT_TYPE: ["Surgical Procedure"]and neither field present should load without error. The existing test fixture attest_conditional_requirements.yamlincludes pharmacotherapy fields even for a"Surgical Procedure"treatment type, which means it doesn't exercise the negative path the PR is specifically fixing. Add pytest cases for both paths. -
modules/Clinical/domains/diagnosis.yaml—AJCC_TUMOR_STAGING_AVAILABLE: required: truemay break validation for legacy records (biological + coverage)
Making the new sentinel field unconditionallyrequired: truemeans any existingDiagnosisrecord submitted before this PR (which cannot haveAJCC_TUMOR_STAGING_AVAILABLE) will fail validation on re-submission or schema re-validation. This is expected for a breaking change, but should be explicitly acknowledged — and if a schemaversion:field exists on the Clinical module, a minor or major version bump is warranted to signal the break to downstream consumers. Confirm this is intentional and document it in the PR or changelog.
Informational
-
modules/Clinical/domains/followup.yaml—ECOGPerformanceStatusEnummixes numeric and string values (biological)
The addition of"Not Reported"after"5"is alphabetically correct (digits sort before letters in lexicographic order) and consistent with how other enums in this module handle the same sentinel. Downstream consumers sorting the enum numerically may place"Not Reported"unexpectedly, but this is a pre-existing pattern in the module — no action required. -
modules/Clinical/tests/test_conditional_requirements.yaml— Surgical-procedure therapy record includes pharmacotherapy fields (biological + coverage)
The test fixture specifiesTREATMENT_TYPE: "Surgical Procedure"while also populatingPHARMACOTHERAPY_TYPEandTHERAPEUTIC_AGENTS. This doesn't cause a validation failure (optional fields may be present), but it doesn't exercise the key negative case the PR intends to fix. Addressed as a warning above (add an explicit negative-path test); flagged here for awareness that the existing fixture alone is insufficient documentation of the fixed behavior.
Verdict
REQUEST_CHANGES
Rules defined in CLAUDE.md · To update review rules, edit CLAUDE.md and open a PR to main
Auto-generated Python classes from LinkML schema updates. **Auto-generated by GitHub Actions workflow** [skip ci]
|
✅ Python classes auto-updated! modules/Clinical/src/htan_clinical/datamodel/clinical.py |
#159) Sphinx slugifies heading IDs by lowercasing and replacing non-alphanumeric characters with dashes. The link generator was using .lower() only, so enum names with underscores (e.g. tissue_or_organ_of_origin_uberon_enum) generated anchors that didn't match the rendered heading ID, causing the broken link reported for SITE_OF_RESECTION_OR_BIOPSY. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add ChannelMetadata to the json-schema-synapse.yml workflow so HTAN.ChannelMetadata-v*.json is produced at each release alongside the Level 2/3/4 schemas. Also update the MultiplexMicroscopy Makefile so local gen-synapse-schema generates both schemas. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Superseded by updated review on commit 76d6fde
There was a problem hiding this comment.
HTAN Schema Review — ✅ Approved
Automated review by htan-claude · commit
76d6fdec57e7c8633a7b747dbe32dd5837015851
Solid PR with a clear problem statement — found one blocking issue with an invalid LinkML precondition expression, several test coverage gaps, and a few informational notes worth tracking.
Caution
1 blocking issue must be resolved before merge
Files Changed
modules/Clinical/domains/diagnosis.yamlmodules/Clinical/domains/therapy.yamlmodules/Clinical/domains/followup.yamlmodules/Clinical/linkml_to_flat_synapse_jsonschema.pymodules/Clinical/tests/test_clinical_data.yamlmodules/Clinical/tests/test_conditional_requirements.yaml
Checklist Results
| Check | Result | Notes |
|---|---|---|
| Inheritance correctness | PASS | No inheritance changes in this PR |
| Inlining of nested objects | N/A | No nested object changes |
| Slot completeness (range, title, description) | PASS | New slot AJCC_TUMOR_STAGING_AVAILABLE and modified slots are structurally complete |
| Enum integrity (alphabetical, descriptions) | PASS | PharmacotherapyTypeEnum and ECOGPerformanceStatusEnum ordering correct; new values have descriptions |
| Generated artifacts | N/A | Generated files excluded from diff per scope rules |
Findings
Blocking
-
therapy.yaml— Invalid precondition expression forAGE_IN_DAYS_AT_TREATMENT_END(structural + biological)
The rule that makesOFF_TREATMENT_REASONandRESPONSEconditionally required usesrequired: trueinsidepreconditions.slot_conditions.AGE_IN_DAYS_AT_TREATMENT_END. In LinkML,requiredis a slot definition property — it is not a valid runtime value comparator inside a rule precondition. LinkML preconditions check slot values (viaequals_string,minimum_value,value_presence, etc.), not schema declarations. As written, this rule will never fire, meaningOFF_TREATMENT_REASONandRESPONSEwill remain unconditionally optional even when a treatment end date is supplied. Fix by replacingrequired: truewithvalue_presence: PRESENTin the precondition block.
Warnings
-
therapy.yaml— Multivalued slotpatternmatching in rule preconditions may not fire correctly (structural + biological)
The pharmacotherapy and surgical/radiation conditional rules useany_of: [{pattern: ".*Pharmacotherapy.*"}](and analogous patterns) as preconditions onTREATMENT_TYPE, which ismultivalued: true. LinkML'sslot_conditionswithpatternevaluates against slot values in an implementation-dependent way for multivalued slots — the PR itself notes a related issue whereequals_stringon multivalued slots generates broken{"const": "..."}JSON Schema (fixed viafix_multivalued_conditions()). It is not confirmed that the same post-processor handlespattern-based preconditions in conditional rules. If these preconditions silently fail,PHARMACOTHERAPY_TYPEandTHERAPEUTIC_AGENTSwill never be enforced even whenTREATMENT_TYPEcontains"Pharmacotherapy". Verify duringmake modules-genand confirm whetherfix_multivalued_conditions()covers this case, or document the limitation explicitly. -
diagnosis.yaml—AJCC_TUMOR_STAGING_AVAILABLErequired cardinality may break retrospective data re-ingestion (biological)
The newAJCC_TUMOR_STAGING_AVAILABLEslot is added asrequired: true. Retrospective HTAN cohorts frequently lack documentation of whether AJCC staging was even attempted (distinct from staging results being absent), so a hard required constraint may cause existing validDiagnosisrecords to fail validation on re-ingestion. Consider whetherrequired: falsewith a default of"Unknown"better reflects real-world data completeness, or explicitly document why retroactive enforcement is acceptable for this release. -
modules/Clinical/tests/— Missing test coverage forAJCC_TUMOR_STAGING_AVAILABLEconditionality and the "No" path (structural + coverage)
The updated test data covers only the case whereAJCC_TUMOR_STAGING_AVAILABLE: "Yes"andAJCC_STAGING_SYSTEM_EDITION: "8th"are both present. Missing coverage includes: (a) aDiagnosisrecord withAJCC_TUMOR_STAGING_AVAILABLE: "No"and noAJCC_STAGING_SYSTEM_EDITIONloading without error, (b) a record missingAJCC_TUMOR_STAGING_AVAILABLEentirely raising aValueError, and (c) confirmation thatAJCC_STAGING_SYSTEM_EDITIONis no longer unconditionally required at the class level. The PR description lists these as manual verification steps — they should be automated. -
modules/Clinical/tests/— Missing test coverage forPHARMACOTHERAPY_TYPE/THERAPEUTIC_AGENTSoptionality (coverage)
The test data does not demonstrate aTherapyrecord withTREATMENT_TYPE: ["Surgical Procedure"]that omits bothPHARMACOTHERAPY_TYPEandTHERAPEUTIC_AGENTSand loads without error. Without this, the fix for issue #165 is not automatically validated. Add a test case asserting these fields arerequired: falseat the class level and that a non-pharmacotherapy record omitting them passes validation. -
modules/Clinical/tests/— New"Not Reported"value inECOGPerformanceStatusEnumlacks test coverage (coverage)
The new permissible value"Not Reported"added toECOGPerformanceStatusEnumis not exercised in any visible test data. Add at least one test that loads a validFollowUp(or equivalent) instance withECOG_PERFORMANCE_STATUS: "Not Reported"and confirms no error is raised. An invalid-value rejection test (e.g.,"6") would further strengthen coverage.
Informational
-
diagnosis.yaml/therapy.yaml— Schema version bump advisable for breaking changes (biological + coverage)
This PR adds a new required field (AJCC_TUMOR_STAGING_AVAILABLE), makes a previously required field optional (AJCC_STAGING_SYSTEM_EDITION), and changes the requiredness of two slots inTherapy— all backward-incompatible changes for existing data consumers. If the schema headers carry aversion:field, incrementing it would communicate the breaking change clearly. No action required unless the team's versioning policy mandates it. -
therapy.yaml— Generic slot nameRESPONSEnoted for future refactoring (coverage)
The slotRESPONSEwas not introduced in this PR but itsslot_usageentry was touched (changed torequired: false). For awareness: a more specific name such asTREATMENT_RESPONSEwould reduce risk of slot overloading in future cross-module work. No action needed now.
Verdict
REQUEST_CHANGES
Rules defined in CLAUDE.md · To update review rules, edit CLAUDE.md and open a PR to main
Auto-generated Python classes from LinkML schema updates. **Auto-generated by GitHub Actions workflow** [skip ci]
|
✅ Python classes auto-updated! modules/Clinical/src/htan_clinical/datamodel/clinical.py |
|
Closing to start fresh on clinical changes. |
Summary
Fixes #152 / #153 — Required enum fields missing sentinel values: adds AJCC_TUMOR_STAGING_AVAILABLE (Yes/No/Unknown) as a new required field to Diagnosis, makes AJCC_STAGING_SYSTEM_EDITION optional (conditional on the new flag), and adds "Not Reported" to ECOGPerformanceStatusEnum
Fixes #165 — PHARMACOTHERAPY_TYPE and THERAPEUTIC_AGENTS were unconditionally required in Therapy; both are now set to required: false in the class body and slot_usage, and moved into the pharmacotherapy conditional rule
Fixes #166 — LinkML generates {"const": "Pharmacotherapy"} for equals_string checks on multivalued slots, which never matches an array value; adds a fix_multivalued_conditions() post-processor in linkml_to_flat_synapse_jsonschema.py that rewrites these to {"contains": {"const": "..."}} at build time
Test plan