Feat/issue 177 179 panel id and target type#180
Conversation
…Z conditional Add HTAN_PANEL_ID as a required field to MultiplexMicroscopyLevel2 and ChannelMetadata to enable panel-level grouping and join keys across RecordSets. Make PHYSICAL_SIZE_Z optional at the class level, conditionally required via a rule when SIZE_Z >= 2, to accommodate 2D acquisitions (e.g. CODEX) where no z-stack is collected. Fixes #177. Fixes #174. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…s for non-human probes Add TargetTypeEnum (Human Gene, Human Transcript, Other) and a required TARGET_TYPE slot to SpatialPanel to support probes targeting non-human genes (e.g. microbiome, viral targets) that lack Ensembl/HGNC identifiers. GENE_SYMBOL, HGNC_VERSION, and GENE_ID are now conditionally required via rules when TARGET_TYPE is Human Gene or Human Transcript. USER_GENE_NAME and the new OTHER_TARGET_TYPE (free-text) are conditionally required when TARGET_TYPE is Other. Fixes #179. Closes #175. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Auto-generated Python classes from LinkML schema updates. **Auto-generated by GitHub Actions workflow** [skip ci]
|
✅ Python classes auto-updated! modules/Clinical/src/htan_clinical/datamodel/clinical.py |
Rename GENE_ID → ENSEMBL_ID with updated pattern (ENSG|ENST) to correctly support both gene and transcript identifiers; drop the undocumented bare integer branch. Replace GENE_SYMBOL and USER_GENE_NAME with a single required TARGET_NAME field, universal across all target types. Rename OTHER_TARGET_TYPE → OTHER_TARGET_DESCRIPTION to clarify it is a free-text descriptor, not a type selector. Scope HGNC_VERSION to Human Gene only (not Human Transcript, since transcript versioning is handled via Ensembl version suffixes). Split the single Human Gene/Transcript rule into two separate rules accordingly. Expand TargetTypeEnum with Bacterial, Control Probe, Human Protein, and Viral to support current and near-term panel use cases and reserve enum slots for future identifier model work.
Superseded by updated review on commit 80b479a
Auto-generated Python classes from LinkML schema updates. **Auto-generated by GitHub Actions workflow** [skip ci]
|
✅ Python classes auto-updated! modules/Clinical/src/htan_clinical/datamodel/clinical.py |
…icroscopy Clarify enum value descriptions for Bacterial, Viral, Control Probe, and Human Protein to document why they have no conditional identifier rules. Update ENSEMBL_ID description to specify ENSG-prefixed IDs for Human Gene and ENST-prefixed IDs for Human Transcript. Add version fields to all three modified schemas: spatial_panel.yaml (2.0.0), multiplex_microscopy_channel_metadata.yaml (1.5.0), level_2.yaml (1.1.0). Add tests: - test_htan_panel_id_required and test_htan_panel_id_required_channel_metadata covering required status and pattern validity for both MultiplexMicroscopy classes - test_target_type_invalid_value asserting "Fungal" is not a valid TargetTypeEnum value - test_spatial_panel_conditional_rules_instances covering valid Human Gene, invalid Human Gene (missing ENSEMBL_ID), valid Human Transcript (no HGNC_VERSION), invalid Other (missing OTHER_TARGET_DESCRIPTION), and invalid TARGET_TYPE Update test_valid_channel_metadata_instance to include the now-required HTAN_PANEL_ID field. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…/github.com/ncihtan/htan2-data-model into feat/issue-177-179-panel-id-and-target-type
Superseded by updated review on commit 61789de
There was a problem hiding this comment.
HTAN Schema Review — ✅ Approved
Automated review by htan-claude · commit
61789de1a56dedf38735698e589f64f09e41c974
The PR is well-structured and the schema changes are solid — a few test coverage gaps and one meaningful biological modelling concern need attention before merge.
Caution
1 blocking issue must be resolved before merge
Files Changed
modules/MultiplexMicroscopy/domains/level_2.yamlmodules/MultiplexMicroscopy/domains/multiplex_microscopy_channel_metadata.yamlmodules/SpatialOmics/domains/spatial_panel.yamlmodules/MultiplexMicroscopy/tests/test_multiplex_microscopy.pymodules/SpatialOmics/tests/test_spatial.py
Checklist Results
| Check | Result | Notes |
|---|---|---|
| Inheritance correctness | PASS | No inheritance chains modified |
| Inlining of nested objects | N/A | No inlined objects introduced |
| Slot completeness (range, title, description) | PASS | All new/changed slots have range, title, and description |
| Enum integrity (alphabetical, descriptions) | PASS | TargetTypeEnum is alphabetically ordered; all values have descriptions |
| Generated artifacts | N/A | Explicitly excluded from this PR per PR description |
Findings
Blocking
-
spatial_panel.yaml—ENSEMBL_IDpattern accepts bothENSGandENSTprefixes with no per-TARGET_TYPEenforcement (biological)
The slot-level pattern^(ENSG\\d+|ENST\\d+)$permits both gene-level (ENSG) and transcript-level (ENST) Ensembl identifiers in the same field. TheHuman Geneconditional rule requiresENSEMBL_IDbut does not constrain it toENSG-prefixed values, meaning a submitter could satisfy theHuman Generule with a transcript ID and vice versa — a scientifically incorrect submission that the model would accept. The description text gives correct guidance but the model does not enforce it. The cleanest fix is to split into two slots (ENSEMBL_GENE_IDrestricted to^ENSG\\d+$andENSEMBL_TRANSCRIPT_IDrestricted to^ENST\\d+$) and reference each in the appropriate conditional rule; alternatively, add postcondition pattern constraints to the existing rules blocks if splitting slots is too disruptive.
Warnings
-
test_multiplex_microscopy.py— No invalid-instance test forHTAN_PANEL_IDinMultiplexMicroscopyLevel2(coverage)
There is no test that attempts to instantiate aMultiplexMicroscopyLevel2object withoutHTAN_PANEL_IDand asserts aValueErroris raised. Coverage guidelines require at least one invalid-instance path for each new required slot on a class. Add a test analogous to the invalid-instance pattern used forChannelMetadata, omittingHTAN_PANEL_IDfrom aMultiplexMicroscopyLevel2instance. -
test_multiplex_microscopy.py— No instance-level tests for thePHYSICAL_SIZE_Zconditional rule (coverage)
test_physical_size_z_conditionalinspects schema structure only; it does not exercise the rule through actual data instances. Following the pattern used intest_spatial_panel_conditional_rules_instances, at least one valid 2D instance (SIZE_Z=1, noPHYSICAL_SIZE_Z) and one invalid 3D instance (SIZE_Z=3,PHYSICAL_SIZE_Zabsent) should be validated against the LinkML runtime or a validator to confirm the rule fires correctly. -
test_spatial.py— No valid-instance coverage forBacterial,Viral,Control Probe, andHuman Proteinenum values (coverage)
test_spatial_panel_conditional_rules_instancesexercisesHuman Gene,Human Transcript,Other, and an invalid type, but the four remainingTargetTypeEnumvalues have no instance-level test path. A short parametrized test supplying onlyTARGET_NAME(and omitting identifier fields) for each of these four values would close the gap and confirm the absence of unintended required-slot violations. -
test_spatial.py— "Dropped fields" assertion checksOTHER_TARGET_TYPE(a slot that never existed) instead of guarding the correct slot (coverage)
The test iterates over["GENE_SYMBOL", "GENE_ID", "USER_GENE_NAME", "OTHER_TARGET_TYPE"]to assert these names are absent fromSpatialPanel. BecauseOTHER_TARGET_TYPEwas never defined in the schema, this assertion passes vacuously and provides no protection against accidentally droppingOTHER_TARGET_DESCRIPTION. The assertion should either be removed or replaced with a positive check thatOTHER_TARGET_DESCRIPTIONis present (which the conditional-slots loop already covers). The PR description also referencesOTHER_TARGET_TYPEin several places, confirming this is a naming artefact from development. -
spatial_panel.yaml—Human Proteinenum value has no conditional rule and no enforced identifier (biological)
TargetTypeEnumincludesHuman Proteinbut norulesblock enforces any identifier requirements for this type, unlikeHuman GeneandHuman Transcriptwhich both requireENSEMBL_ID. A submitter selectingHuman Proteinreceives no model-level constraint distinguishing it fromOther, and there is no free-text descriptor required either. If protein-level identifiers are intentionally deferred, this should be explicitly documented in the slot or enum description as a known gap; otherwise, consider adding a rule requiring a protein identifier (e.g., UniProt accession) or foldingHuman ProteinintoOtheruntil identifiers are standardised. -
spatial_panel.yaml—ViralandBacterialtarget types have no required descriptor field (biological)
OTHER_TARGET_DESCRIPTIONis only conditionally required whenTARGET_TYPE = Other, but the descriptions forViralandBacterialstate that no standardised identifier is mandated. This means viral and bacterial probe targets are identified solely byTARGET_NAMEwith no additional free-text description enforced. Consider either extending theOTHER_TARGET_DESCRIPTIONrule (or a renamedNON_HUMAN_TARGET_DESCRIPTION) to coverViralandBacterial, or clarifying in the model description thatTARGET_NAMEalone is the intended and sufficient identifier for these categories. -
spatial_panel.yaml—USER_GENE_NAMEremoved without a deprecation path (biological)
USER_GENE_NAMEwas a previously defined slot inSpatialPaneland has been dropped entirely in this PR (replaced in function byOTHER_TARGET_DESCRIPTION). Any existing data submissions that populatedUSER_GENE_NAMEwill silently lose that field upon re-validation. The2.0.0version bump signals a breaking change, but a changelog entry or migration note in the schema-leveldescriptionwould reduce re-ingestion risk. Consider adding a comment or schema annotation mapping the old slot name to its replacement.
Informational
-
spatial_panel.yaml— PR description referencesOTHER_TARGET_TYPE; schema consistently usesOTHER_TARGET_DESCRIPTION(structural + coverage)
The PR description's "Changes" section states "USER_GENE_NAMEandOTHER_TARGET_TYPErequired whenTARGET_TYPEisOther", but the actual slot name throughout the YAML and (corrected) tests isOTHER_TARGET_DESCRIPTION. The YAML itself is internally consistent — this is a prose artefact in the PR description only. No schema fix needed, but updating the PR description improves traceability. -
level_2.yaml/multiplex_microscopy_channel_metadata.yaml/spatial_panel.yaml— Version bumps are proportionate (coverage)
spatial_panel.yamlbumps to2.0.0(appropriate given removal of previously required slots),level_2.yamlto1.1.0, andmultiplex_microscopy_channel_metadata.yamlto1.5.0. All increments are consistent with the scope of their respective changes. -
spatial_panel.yaml—SpatialPanelhas noHTAN_BIOSPECIMEN_IDprovenance link (biological)
SpatialPanelrepresents a reagent/probe panel definition rather than a biospecimen-derived record, so the absence ofHTAN_BIOSPECIMEN_IDis scientifically defensible — panels are analogous to reagent manifests and are linked to biospecimen provenance indirectly viaHTAN_PANEL_IDin image-level records. No action required given the current join-key design, but worth noting ifSpatialPanelrecords are ever ingested as standalone RecordSets.
Verdict
REQUEST_CHANGES
Rules defined in CLAUDE.md · To update review rules, edit CLAUDE.md and open a PR to main
Auto-generated Python classes from LinkML schema updates. **Auto-generated by GitHub Actions workflow** [skip ci]
|
✅ Python classes auto-updated! modules/Clinical/src/htan_clinical/datamodel/clinical.py |
…overage Add pattern constraints to ENSEMBL_ID postconditions: ENSG-prefixed required for Human Gene, ENST-prefixed required for Human Transcript. This closes the blocking issue where the slot-level pattern accepted both prefixes regardless of TARGET_TYPE. Add deprecation migration note to spatial_panel.yaml description mapping removed slots (GENE_SYMBOL, GENE_ID, USER_GENE_NAME) to their replacements. Remove vacuous OTHER_TARGET_TYPE assertion from dropped-fields test (slot never existed in the schema). Expand test coverage: - ENSG/ENST prefix enforcement in test_spatial_panel_conditional_rules_instances - Valid instances for Bacterial, Viral, Control Probe, Human Protein - test_htan_panel_id_missing_level2_raises for MultiplexMicroscopyLevel2 - test_physical_size_z_rule_instances for 2D and 3D acquisition cases Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…/github.com/ncihtan/htan2-data-model into feat/issue-177-179-panel-id-and-target-type
Superseded by updated review on commit 7a5bb77
Auto-generated Python classes from LinkML schema updates. **Auto-generated by GitHub Actions workflow** [skip ci]
|
✅ Python classes auto-updated! modules/Clinical/src/htan_clinical/datamodel/clinical.py |
adamjtaylor
left a comment
There was a problem hiding this comment.
A few things to discuss, but nearly there I think
…/github.com/ncihtan/htan2-data-model into feat/issue-177-179-panel-id-and-target-type
Auto-generated Python classes from LinkML schema updates. **Auto-generated by GitHub Actions workflow** [skip ci]
|
✅ Python classes auto-updated! modules/Clinical/src/htan_clinical/datamodel/clinical.py |
…overage pdate ENSEMBL_ID pattern at slot level and in postconditions to accept optional Ensembl version suffixes (e.g., ENSG00000136997.20, ENST00000621592.7): slot-level: ^(ENSG|ENST)\d+(\.\d+)?$ Human Gene rule: ^ENSG\d+(\.\d+)?$ Human Transcript rule: ^ENST\d+(\.\d+)?$ Strip operational slot-requirement language from TargetTypeEnum permissible value descriptions — rules blocks are the authoritative source; duplicate prose creates a maintenance hazard. Add HTAN_PANEL_ID to the required_attrs regression sweep in test_required_attributes_level2. Add test_invalid_channel_metadata_missing_htan_panel_id to confirm that constructing ChannelMetadata without HTAN_PANEL_ID raises ValueError. Also remove CHANNEL_METADATA_ID from level_2.yaml (replaced by HTAN_PANEL_ID as the join key to ChannelMetadata) and update descriptions and tests accordingly.
…/github.com/ncihtan/htan2-data-model into feat/issue-177-179-panel-id-and-target-type
Superseded by updated review on commit 62ec5a6
Auto-generated Python classes from LinkML schema updates. **Auto-generated by GitHub Actions workflow** [skip ci]
|
✅ Python classes auto-updated! modules/Clinical/src/htan_clinical/datamodel/clinical.py |
Add missing terminal period to TARGET_TYPE description. Update OTHER_TARGET_DESCRIPTION example to remove viral gene name reference (Viral is its own TargetTypeEnum value and should not map to Other).
|
Fixed: Intentionally not fixed:
|
…/github.com/ncihtan/htan2-data-model into feat/issue-177-179-panel-id-and-target-type
Replace PANEL_SYNAPSE_ID with HTAN_PANEL_ID using the standard HTAN P-prefix pattern, consistent with SpatialPanel, MultiplexMicroscopyLevel2, and ChannelMetadata. Also fix PANEL_NAME description (was incorrectly copied from PANEL_SIZE_TOTAL_TARGETS).
Superseded by updated review on commit 1d0abd4
There was a problem hiding this comment.
HTAN Schema Review — ✅ Approved
Automated review by htan-claude · commit
1d0abd4756a56ef167de1772ba6a33442322dcb0
Solid PR overall — the conditional logic for TARGET_TYPE and PHYSICAL_SIZE_Z is well-structured, but there's one blocking provenance issue and several warnings worth addressing before merge.
Caution
1 blocking issue must be resolved before merge
Files Changed
modules/MultiplexMicroscopy/domains/level_2.yamlmodules/MultiplexMicroscopy/domains/multiplex_microscopy_channel_metadata.yamlmodules/SpatialOmics/domains/spatial_panel.yamlmodules/SpatialOmics/domains/level_3.yamlmodules/MultiplexMicroscopy/tests/test_multiplex_microscopy.pymodules/SpatialOmics/tests/test_spatial.py
Checklist Results
| Check | Result | Notes |
|---|---|---|
| Inheritance correctness | PASS | No inheritance changes introduced |
| Inlining of nested objects | PASS | All new slots are range: string; no inlining required |
| Slot completeness (range, title, description) | PASS | All new/changed slots have range, title, and description |
| Enum integrity (alphabetical, descriptions) | PASS | TargetTypeEnum is alphabetical, all 7 values have descriptions |
| Generated artifacts | N/A | Generated .py and .json files excluded from diff per PR note |
Findings
Blocking
-
modules/MultiplexMicroscopy/domains/level_2.yaml—CHANNEL_METADATA_IDremoved with no 1:1 replacement link toChannelMetadataRecordSet (biological)The old
CHANNEL_METADATA_ID(Synapse ID,^syn\d+$pattern) was the explicit foreign key pointing a singleMultiplexMicroscopyLevel2row at its exactChannelMetadataRecordSet. Its replacement,HTAN_PANEL_ID, is a panel-level grouping key shared across many image rows and many channel rows — the join it creates is one-to-many, not one-to-one. After this change there is no unambiguous way to resolve which specificChannelMetadataRecordSet belongs to a given image file record, breaking the provenance chain. Either retainCHANNEL_METADATA_ID(or an equivalent unique RecordSet pointer) alongsideHTAN_PANEL_ID, or introduce a new slot such asCHANNEL_METADATA_RECORDSET_IDthat uniquely identifies theChannelMetadataRecordSet for each image.
Warnings
-
modules/SpatialOmics/domains/spatial_panel.yaml—Human Proteintarget type has no conditional rule for identifier slots (biological)TargetTypeEnumincludesHuman ProteinalongsideHuman GeneandHuman Transcript, but only the gene/transcript values have conditional rules requiringENSEMBL_IDandHGNC_VERSION. Human protein targets used in antibody-based spatial proteomics (e.g., CODEX, MIBI) have stable identifiers (UniProt, HGNC protein IDs) and deserve the same treatment. As written, aHuman Proteinrecord needs onlyTARGET_NAME, leaving it as under-annotated as aBacterialprobe. Either add a rule requiringHGNC_VERSION(or a newUNIPROT_IDslot) whenTARGET_TYPEisHuman Protein, or add an explicit note in the slot description documenting the intentional omission. -
modules/SpatialOmics/domains/spatial_panel.yaml—BacterialandViraltarget types have no conditional rule for taxonomic identifiers (biological)BacterialandViralare added as permissibleTARGET_TYPEvalues, but unlikeOtherthey do not trigger a requirement forOTHER_TARGET_DESCRIPTIONor any equivalent taxonomic identifier (e.g., NCBI Taxonomy ID, GenBank accession). A bacterial or viral probe record could be submitted with onlyTARGET_NAMEand no further annotation, which is insufficient for reproducibility in spatial metagenomics/viromics workflows. Consider extending theOtherconditional rule to coverBacterialandViral, or adding a dedicated taxonomic identifier slot with appropriate conditionality. -
modules/SpatialOmics/domains/spatial_panel.yaml/level_2.yaml/multiplex_microscopy_channel_metadata.yaml/level_3.yaml—HTAN_PANEL_IDpattern duplicated across four files (biological + coverage)The regex pattern
^(?=.{1,50}$)(HTA2[0-2][0-9])_(0000|EXT[0-9]{1,18}|[0-9]{1,21})_(P[0-9]{1,20})$is copy-pasted into all four files independently. If the pattern ever changes in one location it will silently diverge elsewhere. Consider definingHTAN_PANEL_IDonce as a shared slot (e.g., inCoreFileor an appropriate base schema, consistent with howHTAN_DATA_FILE_IDis handled incore.yaml) and importing it into each module. -
modules/MultiplexMicroscopy/tests/test_multiplex_microscopy.py—test_htan_panel_id_missing_level2_raisesuses a hand-rolled validator instead of the dataclass constructor (coverage)The test for a missing
HTAN_PANEL_IDinMultiplexMicroscopyLevel2implements a customvalidate_panel_idfunction rather than instantiating the generatedMultiplexMicroscopyLevel2dataclass. This validates schema metadata but does not confirm that the runtime dataclass enforces the constraint — unlike the paralleltest_invalid_channel_metadata_missing_htan_panel_idwhich correctly uses the dataclass constructor. Add a test that callsMultiplexMicroscopyLevel2(...)withoutHTAN_PANEL_IDand asserts aValueErroris raised. -
modules/SpatialOmics/tests/test_spatial.py— No valid-instance load test forSpatialPanelusing the generated dataclass (coverage)test_spatial_panel_conditional_rules_instancesuses a hand-rolled validator rather than loading a realSpatialPanelinstance through the generated dataclass or alinkml-runtimeloader. No test constructs an actualSpatialPanelobject and verifies it parses cleanly end-to-end, which means enum range enforcement and pattern validation vialinkml-validateare not exercised. Add at least one test that instantiates aSpatialPanelrecord via the generated dataclass or JSON schema validator.
Informational
-
modules/SpatialOmics/domains/spatial_panel.yaml— Schema version field absent despite breaking changes (structural + coverage)spatial_panel.yamlhas noversion:field (before or after this PR), yet the changes are substantial and breaking: three required slots removed, three new slots added, one slot renamed with broadened semantics, a new enum added, and three conditional rules introduced. Thelevel_2.yamlandchannel_metadata.yamlfiles correctly carry version bumps;spatial_panel.yamlshould follow suit (e.g.,version: "2.0.0"). No action required to merge, but worth tracking. -
modules/SpatialOmics/domains/level_3.yaml—PANEL_SYNAPSE_IDrename may affect already-submitted data (biological)Renaming
PANEL_SYNAPSE_IDtoHTAN_PANEL_IDis a breaking change for anySpatialLevel3records previously submitted with the old slot name. If any HTAN centers have live submissions using this field, a migration script or backward-compatibility alias will be needed. This is a coordination concern rather than a schema correctness issue — flagged for awareness. -
modules/SpatialOmics/domains/spatial_panel.yaml—SpatialPanelhas no provenance back-link to a biospecimen or assay record (biological)SpatialPanelholdsHTAN_PANEL_IDas its identifier but carries noHTAN_PARENT_IDorHTAN_BIOSPECIMEN_ID. The join key is held on the assay side (SpatialLevel3.HTAN_PANEL_ID), which appears intentional — panel as a lookup table. IfSpatialPanelis ever submitted as a standalone RecordSet without a correspondingSpatialLevel3row, provenance would be severed. Worth confirming this join-from-assay direction is the authoritative design. -
modules/MultiplexMicroscopy/domains/level_2.yaml/multiplex_microscopy_channel_metadata.yaml—HTAN_PANEL_IDcorrectly omitsidentifier: true(coverage)In
SpatialPanel,HTAN_PANEL_IDcarriesidentifier: true. InMultiplexMicroscopyLevel2andChannelMetadata, the slot is a foreign-key reference (noidentifier: true), which is correct. Noted for completeness — no action required.
Verdict
REQUEST_CHANGES
Rules defined in CLAUDE.md · To update review rules, edit CLAUDE.md and open a PR to main
Auto-generated Python classes from LinkML schema updates. **Auto-generated by GitHub Actions workflow** [skip ci]
|
✅ Python classes auto-updated! modules/Clinical/src/htan_clinical/datamodel/clinical.py |
…/github.com/ncihtan/htan2-data-model into feat/issue-177-179-panel-id-and-target-type
Superseded by updated review on commit 9633a4b
There was a problem hiding this comment.
HTAN Schema Review — ✅ Approved
Automated review by htan-claude · commit
9633a4ba54309f20c3850d1859fe0140a8bea686
Solid PR with clear intent — there are a handful of warnings worth addressing before merge, but no blocking issues preventing it.
Warning
4 warnings to address (no blocking issues)
Files Changed
modules/SpatialOmics/domains/spatial_panel.yamlmodules/SpatialOmics/domains/level_3.yamlmodules/MultiplexMicroscopy/domains/level_2.yamlmodules/MultiplexMicroscopy/domains/multiplex_microscopy_channel_metadata.yamlmodules/SpatialOmics/tests/test_spatial.pymodules/MultiplexMicroscopy/tests/test_multiplex_microscopy.py
Checklist Results
| Check | Result | Notes |
|---|---|---|
| Inheritance correctness | PASS | No changes to inheritance chains; existing chains unaffected |
| Inlining of nested objects | PASS | HTAN_PANEL_ID slots in level files use range: string, not range: SpatialPanel; no inlining violation |
| Slot completeness (range, title, description) | PASS | All new slots carry range, title, description, and pattern where applicable |
| Enum integrity (alphabetical, descriptions) | PASS | TargetTypeEnum values are alphabetically ordered; all 7 values have descriptions |
| Generated artifacts | N/A | Generated Python and JSON files excluded from diff per scope rules |
Findings
Warnings
-
modules/SpatialOmics/domains/spatial_panel.yaml—Human Proteinhas no conditional identifier rule (structural + biological + coverage)
TargetTypeEnumincludesHuman Protein(relevant for antibody-based spatial proteomics such as CODEX/PhenoCycler), but norulesblock is triggered whenTARGET_TYPEisHuman Protein. This means only the unconditionally requiredTARGET_NAMEstring is enforced, with no stable identifier (e.g., HGNC symbol, UniProt accession, or Ensembl gene ID) required. If this is intentional — e.g., becauseTARGET_NAMEis expected to follow a controlled vocabulary for proteins — please add a comment orannotationsentry to the enum value or the schema documenting that decision, so future maintainers don't interpret it as an accidental omission. If it is not intentional, add a conditional rule requiring at least one identifier slot forHuman Protein. -
modules/SpatialOmics/domains/spatial_panel.yaml—SpatialPanelhas no biospecimen provenance anchor (biological)
SpatialPanel(identified byHTAN_PANEL_ID) contains noHTAN_BIOSPECIMEN_ID,HTAN_PARENT_ID, or equivalent centre-level provenance slot. Without a provenance anchor it is impossible to trace which atlas centre or biospecimen collection a panel record belongs to across RecordSets, which is inconsistent with the HTAN cross-RecordSet traceability pattern. Consider whether at minimum anHTAN_CENTER_IDorHTAN_PARENT_BIOSPECIMEN_IDslot is appropriate here, or explicitly document in the schema whySpatialPanelis intentionally provenance-free (e.g., because panels are centre-level shared resources referenced by experiment-scoped records). -
modules/SpatialOmics/tests/test_spatial.py— No generated-dataclass instance test forSpatialPanel(coverage)
AllSpatialPaneltests useSchemaViewstructural assertions or hand-rolled validators; none construct aSpatialPanelinstance via the generated dataclass (e.g.,from htan_spatialomics.datamodel.spatial_panel import SpatialPanel) to confirm a valid instance loads without error and an invalid one (e.g., missingTARGET_TYPE) raises the expected error. Per the coverage guidelines, at least one valid and one invalid generated-class instance test is required for each changed class. -
modules/MultiplexMicroscopy/tests/test_multiplex_microscopy.py— No generated-dataclass valid-instance test forMultiplexMicroscopyLevel2withHTAN_PANEL_ID(coverage)
test_htan_panel_id_missing_level2_raiseshand-rolls its own validation rather than exercising the generatedMultiplexMicroscopyLevel2dataclass. TheChannelMetadatatests correctly use the generated class, butMultiplexMicroscopyLevel2does not have a corresponding valid-instance test confirming the new requiredHTAN_PANEL_IDslot is accepted end-to-end. Add at least one test that constructs a validMultiplexMicroscopyLevel2instance withHTAN_PANEL_IDpresent using the generated class.
Informational
-
modules/SpatialOmics/domains/spatial_panel.yaml— No organism/database identifier slots forBacterial,Viral, orControl Probetarget types (biological)
ForBacterialandViraltarget types, onlyTARGET_NAME(free text) is required — there are no slots for NCBI Taxonomy IDs, GenBank accessions, or pathogen database references. This may be intentional for an initial implementation supporting microbiome/viral spatial panels. No action required now, but worth tracking as a future enhancement if stricter provenance is needed for non-human targets. -
modules/SpatialOmics/domains/spatial_panel.yaml— Schema version not bumped despite breaking changes (structural + coverage)
The changes tospatial_panel.yamlare substantial: three fields removed (GENE_SYMBOL,GENE_ID,USER_GENE_NAME), multiple fields added, and a new enum introduced. If the schema header carries aversion:key, this level of restructuring warrants an increment to signal a breaking change to downstream consumers. No action required if versioning is managed at the module or repository level rather than per-file. -
PR description contains minor inaccuracies relative to the diff (structural)
The PR description references slot names (OTHER_TARGET_TYPE,USER_GENE_NAME,GENE_SYMBOL,GENE_ID) and a limited enum set (Human Gene,Human Transcript,Other) that do not match the actual schema changes. The YAML itself is internally consistent — this is purely a documentation issue. No schema changes needed, but updating the PR description would help reviewers and historians of the change.
Verdict
APPROVE
Rules defined in CLAUDE.md · To update review rules, edit CLAUDE.md and open a PR to main
|
Disregarding previous claude review, plan to keep this recordset going for all of HTAN Phase 2, similar to clinical and biospecimen. |
Auto-generated Python classes from LinkML schema updates. **Auto-generated by GitHub Actions workflow** [skip ci]
|
✅ Python classes auto-updated! modules/Clinical/src/htan_clinical/datamodel/clinical.py |
Summary
HTAN_PANEL_IDas a required field toMultiplexMicroscopyLevel2andChannelMetadataschemas, enabling panel-level grouping and join keys across RecordSets (fixes Add HTAN_PANEL_ID as required field to MultiplexMicroscopy Level 2 and ChannelMetadata schemas #177)TARGET_TYPEenum and conditional field requirements toSpatialPanelto support non-human probes (e.g. microbiome, viral targets) that lack Ensembl/HGNC identifiers (fixes Add TARGET_TYPE field and conditional requirements for non-human probes #179, closes Support non-human (e.g. microbial) probes in panel metadata — GENE_ID / gene symbol validation too strict #175)PHYSICAL_SIZE_Zoptional inMultiplexMicroscopyLevel2, conditionally required only whenSIZE_Z > 1, to accommodate 2D acquisitions like CODEX (fixes Make PHYSICAL_SIZE_Z optional in MultiplexMicroscopy Level 2 #174)Changes
modules/MultiplexMicroscopy/domains/level_2.yamlHTAN_PANEL_ID(required, with HTAN P-prefix pattern) beforeCHANNEL_METADATA_IDPHYSICAL_SIZE_Zfromrequired: truetorequired: false; added arulesblock making it required whenSIZE_Z >= 2modules/MultiplexMicroscopy/domains/multiplex_microscopy_channel_metadata.yamlHTAN_PANEL_ID(required, with HTAN P-prefix pattern) beforeCHANNEL_IDmodules/SpatialOmics/domains/spatial_panel.yamlTargetTypeEnumwith valuesHuman Gene,Human Transcript,OtherTARGET_TYPEslotOTHER_TARGET_TYPEslot (free-text, conditionally required)GENE_SYMBOL,HGNC_VERSION,GENE_IDfromrequired: truetorequired: falserulesblocks:GENE_SYMBOL/HGNC_VERSION/GENE_IDrequired whenTARGET_TYPEisHuman GeneorHuman Transcript;USER_GENE_NAMEandOTHER_TARGET_TYPErequired whenTARGET_TYPEisOtherTests
test_spatial_panel_classto reflect conditional rather than unconditional requirementstest_physical_size_z_conditionalasserting the rule structure forPHYSICAL_SIZE_ZTest Plan
poetry run pytest modules/MultiplexMicroscopy/— 17 passedpoetry run pytest modules/SpatialOmics/— 11 passed