Skip to content

Feat/issue 177 179 panel id and target type#180

Merged
aditigopalan merged 35 commits into
mainfrom
feat/issue-177-179-panel-id-and-target-type
May 1, 2026
Merged

Feat/issue 177 179 panel id and target type#180
aditigopalan merged 35 commits into
mainfrom
feat/issue-177-179-panel-id-and-target-type

Conversation

@aditigopalan
Copy link
Copy Markdown
Collaborator

Summary

Changes

modules/MultiplexMicroscopy/domains/level_2.yaml

  • Added HTAN_PANEL_ID (required, with HTAN P-prefix pattern) before CHANNEL_METADATA_ID
  • Changed PHYSICAL_SIZE_Z from required: true to required: false; added a rules block making it required when SIZE_Z >= 2

modules/MultiplexMicroscopy/domains/multiplex_microscopy_channel_metadata.yaml

  • Added HTAN_PANEL_ID (required, with HTAN P-prefix pattern) before CHANNEL_ID

modules/SpatialOmics/domains/spatial_panel.yaml

  • Added TargetTypeEnum with values Human Gene, Human Transcript, Other
  • Added required TARGET_TYPE slot
  • Added OTHER_TARGET_TYPE slot (free-text, conditionally required)
  • Changed GENE_SYMBOL, HGNC_VERSION, GENE_ID from required: true to required: false
  • Added two rules blocks: GENE_SYMBOL/HGNC_VERSION/GENE_ID required when TARGET_TYPE is Human Gene or Human Transcript; USER_GENE_NAME and OTHER_TARGET_TYPE required when TARGET_TYPE is Other

Tests

  • Updated test_spatial_panel_class to reflect conditional rather than unconditional requirements
  • Added test_physical_size_z_conditional asserting the rule structure for PHYSICAL_SIZE_Z

Test Plan

  • poetry run pytest modules/MultiplexMicroscopy/ — 17 passed
  • poetry run pytest modules/SpatialOmics/ — 11 passed

Note: Issue #175 is a duplicate of #179 and can be closed without separate changes.

aditigopalan and others added 3 commits April 29, 2026 13:25
…Z conditional

Add HTAN_PANEL_ID as a required field to MultiplexMicroscopyLevel2 and
ChannelMetadata to enable panel-level grouping and join keys across RecordSets.

Make PHYSICAL_SIZE_Z optional at the class level, conditionally required via
a rule when SIZE_Z >= 2, to accommodate 2D acquisitions (e.g. CODEX) where
no z-stack is collected.

Fixes #177. Fixes #174.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…s for non-human probes

Add TargetTypeEnum (Human Gene, Human Transcript, Other) and a required
TARGET_TYPE slot to SpatialPanel to support probes targeting non-human genes
(e.g. microbiome, viral targets) that lack Ensembl/HGNC identifiers.

GENE_SYMBOL, HGNC_VERSION, and GENE_ID are now conditionally required via
rules when TARGET_TYPE is Human Gene or Human Transcript. USER_GENE_NAME and
the new OTHER_TARGET_TYPE (free-text) are conditionally required when
TARGET_TYPE is Other.

Fixes #179. Closes #175.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
github-actions[bot]

This comment was marked as outdated.

Auto-generated Python classes from LinkML schema updates.

**Auto-generated by GitHub Actions workflow**
[skip ci]
@github-actions
Copy link
Copy Markdown
Contributor

Python classes auto-updated!

The Python classes have been automatically regenerated and committed to this PR branch.

**Updated files:**
```
modules/Biospecimen/src/htan_biospecimen/datamodel/biospecimen.py

modules/Clinical/src/htan_clinical/datamodel/clinical.py
modules/DigitalPathology/src/htan_digitalpathology/datamodel/digital_pathology.py
modules/Imaging/src/htan_imaging/datamodel/imaging.py
modules/MultiplexMicroscopy/src/htan_multiplexmicroscopy/datamodel/multiplex_microscopy.py
modules/Sequencing/src/htan_sequencing/datamodel/sequencing.py
modules/SpatialOmics/src/htan_spatial/datamodel/spatial.py
modules/WES/src/htan_wes/datamodel/wes.py
modules/scRNA-seq/src/htan_scrna_seq/datamodel/scrna_seq.py
```

The generated classes are now up to date with the schema changes.

Rename GENE_ID → ENSEMBL_ID with updated pattern (ENSG|ENST) to correctly
support both gene and transcript identifiers; drop the undocumented bare
integer branch.

Replace GENE_SYMBOL and USER_GENE_NAME with a single required TARGET_NAME
field, universal across all target types. Rename OTHER_TARGET_TYPE →
OTHER_TARGET_DESCRIPTION to clarify it is a free-text descriptor, not a
type selector.

Scope HGNC_VERSION to Human Gene only (not Human Transcript, since
transcript versioning is handled via Ensembl version suffixes). Split the
single Human Gene/Transcript rule into two separate rules accordingly.

Expand TargetTypeEnum with Bacterial, Control Probe, Human Protein, and
Viral to support current and near-term panel use cases and reserve enum
slots for future identifier model work.
@github-actions github-actions Bot dismissed their stale review April 30, 2026 16:30

Superseded by updated review on commit 80b479a

github-actions[bot]

This comment was marked as outdated.

Auto-generated Python classes from LinkML schema updates.

**Auto-generated by GitHub Actions workflow**
[skip ci]
@github-actions
Copy link
Copy Markdown
Contributor

Python classes auto-updated!

The Python classes have been automatically regenerated and committed to this PR branch.

**Updated files:**
```
modules/Biospecimen/src/htan_biospecimen/datamodel/biospecimen.py

modules/Clinical/src/htan_clinical/datamodel/clinical.py
modules/DigitalPathology/src/htan_digitalpathology/datamodel/digital_pathology.py
modules/Imaging/src/htan_imaging/datamodel/imaging.py
modules/MultiplexMicroscopy/src/htan_multiplexmicroscopy/datamodel/multiplex_microscopy.py
modules/Sequencing/src/htan_sequencing/datamodel/sequencing.py
modules/SpatialOmics/src/htan_spatial/datamodel/spatial.py
modules/WES/src/htan_wes/datamodel/wes.py
modules/scRNA-seq/src/htan_scrna_seq/datamodel/scrna_seq.py
```

The generated classes are now up to date with the schema changes.

aditigopalan and others added 3 commits April 30, 2026 12:39
…icroscopy

Clarify enum value descriptions for Bacterial, Viral, Control Probe, and
Human Protein to document why they have no conditional identifier rules.
Update ENSEMBL_ID description to specify ENSG-prefixed IDs for Human Gene
and ENST-prefixed IDs for Human Transcript.

Add version fields to all three modified schemas: spatial_panel.yaml (2.0.0),
multiplex_microscopy_channel_metadata.yaml (1.5.0), level_2.yaml (1.1.0).

Add tests:
- test_htan_panel_id_required and test_htan_panel_id_required_channel_metadata
  covering required status and pattern validity for both MultiplexMicroscopy classes
- test_target_type_invalid_value asserting "Fungal" is not a valid TargetTypeEnum value
- test_spatial_panel_conditional_rules_instances covering valid Human Gene,
  invalid Human Gene (missing ENSEMBL_ID), valid Human Transcript (no HGNC_VERSION),
  invalid Other (missing OTHER_TARGET_DESCRIPTION), and invalid TARGET_TYPE

Update test_valid_channel_metadata_instance to include the now-required
HTAN_PANEL_ID field.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions github-actions Bot dismissed their stale review April 30, 2026 16:41

Superseded by updated review on commit 61789de

github-actions[bot]
github-actions Bot previously approved these changes Apr 30, 2026
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HTAN Schema Review — ✅ Approved

Automated review by htan-claude · commit 61789de1a56dedf38735698e589f64f09e41c974

The PR is well-structured and the schema changes are solid — a few test coverage gaps and one meaningful biological modelling concern need attention before merge.

Caution

1 blocking issue must be resolved before merge

Files Changed

  • modules/MultiplexMicroscopy/domains/level_2.yaml
  • modules/MultiplexMicroscopy/domains/multiplex_microscopy_channel_metadata.yaml
  • modules/SpatialOmics/domains/spatial_panel.yaml
  • modules/MultiplexMicroscopy/tests/test_multiplex_microscopy.py
  • modules/SpatialOmics/tests/test_spatial.py

Checklist Results

Check Result Notes
Inheritance correctness PASS No inheritance chains modified
Inlining of nested objects N/A No inlined objects introduced
Slot completeness (range, title, description) PASS All new/changed slots have range, title, and description
Enum integrity (alphabetical, descriptions) PASS TargetTypeEnum is alphabetically ordered; all values have descriptions
Generated artifacts N/A Explicitly excluded from this PR per PR description

Findings

Blocking

  • spatial_panel.yamlENSEMBL_ID pattern accepts both ENSG and ENST prefixes with no per-TARGET_TYPE enforcement (biological)
    The slot-level pattern ^(ENSG\\d+|ENST\\d+)$ permits both gene-level (ENSG) and transcript-level (ENST) Ensembl identifiers in the same field. The Human Gene conditional rule requires ENSEMBL_ID but does not constrain it to ENSG-prefixed values, meaning a submitter could satisfy the Human Gene rule with a transcript ID and vice versa — a scientifically incorrect submission that the model would accept. The description text gives correct guidance but the model does not enforce it. The cleanest fix is to split into two slots (ENSEMBL_GENE_ID restricted to ^ENSG\\d+$ and ENSEMBL_TRANSCRIPT_ID restricted to ^ENST\\d+$) and reference each in the appropriate conditional rule; alternatively, add postcondition pattern constraints to the existing rules blocks if splitting slots is too disruptive.

Warnings

  • test_multiplex_microscopy.py — No invalid-instance test for HTAN_PANEL_ID in MultiplexMicroscopyLevel2 (coverage)
    There is no test that attempts to instantiate a MultiplexMicroscopyLevel2 object without HTAN_PANEL_ID and asserts a ValueError is raised. Coverage guidelines require at least one invalid-instance path for each new required slot on a class. Add a test analogous to the invalid-instance pattern used for ChannelMetadata, omitting HTAN_PANEL_ID from a MultiplexMicroscopyLevel2 instance.

  • test_multiplex_microscopy.py — No instance-level tests for the PHYSICAL_SIZE_Z conditional rule (coverage)
    test_physical_size_z_conditional inspects schema structure only; it does not exercise the rule through actual data instances. Following the pattern used in test_spatial_panel_conditional_rules_instances, at least one valid 2D instance (SIZE_Z=1, no PHYSICAL_SIZE_Z) and one invalid 3D instance (SIZE_Z=3, PHYSICAL_SIZE_Z absent) should be validated against the LinkML runtime or a validator to confirm the rule fires correctly.

  • test_spatial.py — No valid-instance coverage for Bacterial, Viral, Control Probe, and Human Protein enum values (coverage)
    test_spatial_panel_conditional_rules_instances exercises Human Gene, Human Transcript, Other, and an invalid type, but the four remaining TargetTypeEnum values have no instance-level test path. A short parametrized test supplying only TARGET_NAME (and omitting identifier fields) for each of these four values would close the gap and confirm the absence of unintended required-slot violations.

  • test_spatial.py — "Dropped fields" assertion checks OTHER_TARGET_TYPE (a slot that never existed) instead of guarding the correct slot (coverage)
    The test iterates over ["GENE_SYMBOL", "GENE_ID", "USER_GENE_NAME", "OTHER_TARGET_TYPE"] to assert these names are absent from SpatialPanel. Because OTHER_TARGET_TYPE was never defined in the schema, this assertion passes vacuously and provides no protection against accidentally dropping OTHER_TARGET_DESCRIPTION. The assertion should either be removed or replaced with a positive check that OTHER_TARGET_DESCRIPTION is present (which the conditional-slots loop already covers). The PR description also references OTHER_TARGET_TYPE in several places, confirming this is a naming artefact from development.

  • spatial_panel.yamlHuman Protein enum value has no conditional rule and no enforced identifier (biological)
    TargetTypeEnum includes Human Protein but no rules block enforces any identifier requirements for this type, unlike Human Gene and Human Transcript which both require ENSEMBL_ID. A submitter selecting Human Protein receives no model-level constraint distinguishing it from Other, and there is no free-text descriptor required either. If protein-level identifiers are intentionally deferred, this should be explicitly documented in the slot or enum description as a known gap; otherwise, consider adding a rule requiring a protein identifier (e.g., UniProt accession) or folding Human Protein into Other until identifiers are standardised.

  • spatial_panel.yamlViral and Bacterial target types have no required descriptor field (biological)
    OTHER_TARGET_DESCRIPTION is only conditionally required when TARGET_TYPE = Other, but the descriptions for Viral and Bacterial state that no standardised identifier is mandated. This means viral and bacterial probe targets are identified solely by TARGET_NAME with no additional free-text description enforced. Consider either extending the OTHER_TARGET_DESCRIPTION rule (or a renamed NON_HUMAN_TARGET_DESCRIPTION) to cover Viral and Bacterial, or clarifying in the model description that TARGET_NAME alone is the intended and sufficient identifier for these categories.

  • spatial_panel.yamlUSER_GENE_NAME removed without a deprecation path (biological)
    USER_GENE_NAME was a previously defined slot in SpatialPanel and has been dropped entirely in this PR (replaced in function by OTHER_TARGET_DESCRIPTION). Any existing data submissions that populated USER_GENE_NAME will silently lose that field upon re-validation. The 2.0.0 version bump signals a breaking change, but a changelog entry or migration note in the schema-level description would reduce re-ingestion risk. Consider adding a comment or schema annotation mapping the old slot name to its replacement.

Informational

  • spatial_panel.yaml — PR description references OTHER_TARGET_TYPE; schema consistently uses OTHER_TARGET_DESCRIPTION (structural + coverage)
    The PR description's "Changes" section states "USER_GENE_NAME and OTHER_TARGET_TYPE required when TARGET_TYPE is Other", but the actual slot name throughout the YAML and (corrected) tests is OTHER_TARGET_DESCRIPTION. The YAML itself is internally consistent — this is a prose artefact in the PR description only. No schema fix needed, but updating the PR description improves traceability.

  • level_2.yaml / multiplex_microscopy_channel_metadata.yaml / spatial_panel.yaml — Version bumps are proportionate (coverage)
    spatial_panel.yaml bumps to 2.0.0 (appropriate given removal of previously required slots), level_2.yaml to 1.1.0, and multiplex_microscopy_channel_metadata.yaml to 1.5.0. All increments are consistent with the scope of their respective changes.

  • spatial_panel.yamlSpatialPanel has no HTAN_BIOSPECIMEN_ID provenance link (biological)
    SpatialPanel represents a reagent/probe panel definition rather than a biospecimen-derived record, so the absence of HTAN_BIOSPECIMEN_ID is scientifically defensible — panels are analogous to reagent manifests and are linked to biospecimen provenance indirectly via HTAN_PANEL_ID in image-level records. No action required given the current join-key design, but worth noting if SpatialPanel records are ever ingested as standalone RecordSets.

Verdict

REQUEST_CHANGES


Rules defined in CLAUDE.md · To update review rules, edit CLAUDE.md and open a PR to main

Auto-generated Python classes from LinkML schema updates.

**Auto-generated by GitHub Actions workflow**
[skip ci]
@github-actions
Copy link
Copy Markdown
Contributor

Python classes auto-updated!

The Python classes have been automatically regenerated and committed to this PR branch.

**Updated files:**
```
modules/Biospecimen/src/htan_biospecimen/datamodel/biospecimen.py

modules/Clinical/src/htan_clinical/datamodel/clinical.py
modules/DigitalPathology/src/htan_digitalpathology/datamodel/digital_pathology.py
modules/Imaging/src/htan_imaging/datamodel/imaging.py
modules/MultiplexMicroscopy/src/htan_multiplexmicroscopy/datamodel/multiplex_microscopy.py
modules/Sequencing/src/htan_sequencing/datamodel/sequencing.py
modules/SpatialOmics/src/htan_spatial/datamodel/spatial.py
modules/WES/src/htan_wes/datamodel/wes.py
modules/scRNA-seq/src/htan_scrna_seq/datamodel/scrna_seq.py
```

The generated classes are now up to date with the schema changes.

aditigopalan and others added 3 commits April 30, 2026 12:55
…overage

Add pattern constraints to ENSEMBL_ID postconditions: ENSG-prefixed required
for Human Gene, ENST-prefixed required for Human Transcript. This closes the
blocking issue where the slot-level pattern accepted both prefixes regardless
of TARGET_TYPE.

Add deprecation migration note to spatial_panel.yaml description mapping
removed slots (GENE_SYMBOL, GENE_ID, USER_GENE_NAME) to their replacements.

Remove vacuous OTHER_TARGET_TYPE assertion from dropped-fields test (slot
never existed in the schema).

Expand test coverage:
- ENSG/ENST prefix enforcement in test_spatial_panel_conditional_rules_instances
- Valid instances for Bacterial, Viral, Control Probe, Human Protein
- test_htan_panel_id_missing_level2_raises for MultiplexMicroscopyLevel2
- test_physical_size_z_rule_instances for 2D and 3D acquisition cases

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions github-actions Bot dismissed their stale review April 30, 2026 17:09

Superseded by updated review on commit 7a5bb77

github-actions[bot]

This comment was marked as outdated.

Auto-generated Python classes from LinkML schema updates.

**Auto-generated by GitHub Actions workflow**
[skip ci]
@github-actions
Copy link
Copy Markdown
Contributor

Python classes auto-updated!

The Python classes have been automatically regenerated and committed to this PR branch.

**Updated files:**
```
modules/Biospecimen/src/htan_biospecimen/datamodel/biospecimen.py

modules/Clinical/src/htan_clinical/datamodel/clinical.py
modules/DigitalPathology/src/htan_digitalpathology/datamodel/digital_pathology.py
modules/Imaging/src/htan_imaging/datamodel/imaging.py
modules/MultiplexMicroscopy/src/htan_multiplexmicroscopy/datamodel/multiplex_microscopy.py
modules/Sequencing/src/htan_sequencing/datamodel/sequencing.py
modules/SpatialOmics/src/htan_spatial/datamodel/spatial.py
modules/WES/src/htan_wes/datamodel/wes.py
modules/scRNA-seq/src/htan_scrna_seq/datamodel/scrna_seq.py
```

The generated classes are now up to date with the schema changes.

Copy link
Copy Markdown
Collaborator

@adamjtaylor adamjtaylor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few things to discuss, but nearly there I think

Comment thread modules/MultiplexMicroscopy/domains/level_2.yaml Outdated
Comment thread modules/SpatialOmics/domains/spatial_panel.yaml Outdated
Comment thread modules/SpatialOmics/domains/spatial_panel.yaml Outdated
Comment thread modules/SpatialOmics/domains/spatial_panel.yaml Outdated
Comment thread modules/scRNA-seq/src/htan_scrna_seq/datamodel/scrna_seq.py
Comment thread modules/SpatialOmics/domains/spatial_panel.yaml Outdated
Comment thread modules/SpatialOmics/domains/spatial_panel.yaml Outdated
Comment thread modules/SpatialOmics/domains/spatial_panel.yaml
Auto-generated Python classes from LinkML schema updates.

**Auto-generated by GitHub Actions workflow**
[skip ci]
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 1, 2026

Python classes auto-updated!

The Python classes have been automatically regenerated and committed to this PR branch.

**Updated files:**
```
modules/Biospecimen/src/htan_biospecimen/datamodel/biospecimen.py

modules/Clinical/src/htan_clinical/datamodel/clinical.py
modules/DigitalPathology/src/htan_digitalpathology/datamodel/digital_pathology.py
modules/Imaging/src/htan_imaging/datamodel/imaging.py
modules/MultiplexMicroscopy/src/htan_multiplexmicroscopy/datamodel/multiplex_microscopy.py
modules/Sequencing/src/htan_sequencing/datamodel/sequencing.py
modules/SpatialOmics/src/htan_spatial/datamodel/spatial.py
modules/WES/src/htan_wes/datamodel/wes.py
modules/scRNA-seq/src/htan_scrna_seq/datamodel/scrna_seq.py
```

The generated classes are now up to date with the schema changes.

…overage

pdate ENSEMBL_ID pattern at slot level and in postconditions to accept
optional Ensembl version suffixes (e.g., ENSG00000136997.20, ENST00000621592.7):
  slot-level:           ^(ENSG|ENST)\d+(\.\d+)?$
  Human Gene rule:      ^ENSG\d+(\.\d+)?$
  Human Transcript rule: ^ENST\d+(\.\d+)?$

Strip operational slot-requirement language from TargetTypeEnum permissible
value descriptions — rules blocks are the authoritative source; duplicate
prose creates a maintenance hazard.

Add HTAN_PANEL_ID to the required_attrs regression sweep in
test_required_attributes_level2.

Add test_invalid_channel_metadata_missing_htan_panel_id to confirm that
constructing ChannelMetadata without HTAN_PANEL_ID raises ValueError.

Also remove CHANNEL_METADATA_ID from level_2.yaml (replaced by HTAN_PANEL_ID
as the join key to ChannelMetadata) and update descriptions and tests
accordingly.
@github-actions github-actions Bot dismissed their stale review May 1, 2026 16:35

Superseded by updated review on commit 62ec5a6

github-actions[bot]

This comment was marked as outdated.

Auto-generated Python classes from LinkML schema updates.

**Auto-generated by GitHub Actions workflow**
[skip ci]
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 1, 2026

Python classes auto-updated!

The Python classes have been automatically regenerated and committed to this PR branch.

**Updated files:**
```
modules/Biospecimen/src/htan_biospecimen/datamodel/biospecimen.py

modules/Clinical/src/htan_clinical/datamodel/clinical.py
modules/DigitalPathology/src/htan_digitalpathology/datamodel/digital_pathology.py
modules/Imaging/src/htan_imaging/datamodel/imaging.py
modules/MultiplexMicroscopy/src/htan_multiplexmicroscopy/datamodel/multiplex_microscopy.py
modules/Sequencing/src/htan_sequencing/datamodel/sequencing.py
modules/SpatialOmics/src/htan_spatial/datamodel/spatial.py
modules/WES/src/htan_wes/datamodel/wes.py
modules/scRNA-seq/src/htan_scrna_seq/datamodel/scrna_seq.py
```

The generated classes are now up to date with the schema changes.

Add missing terminal period to TARGET_TYPE description.

Update OTHER_TARGET_DESCRIPTION example to remove viral gene name reference
(Viral is its own TargetTypeEnum value and should not map to Other).
@aditigopalan
Copy link
Copy Markdown
Collaborator Author

Fixed:
TARGET_TYPE missing terminal period ✅
OTHER_TARGET_DESCRIPTION example removed viral gene reference ✅

Intentionally not fixed:

  • No conditional rules for Bacterial/Viral/Control Probe/Human Protein: deliberate, only TARGET_NAME required for those
  • TARGET_TYPE name kept as-is (not renamed to SPATIAL_TARGET_TYPE etc.)
  • CHANNEL_METADATA_ID deprecation notice not needed
  • HTAN_PANEL_ID inlining concern — the structural issue is fine (no class has range: SpatialPanel without inlined), but the blocking item asked us to document in the HTAN_PANEL_ID description that it serves as the cross-RecordSet join key; we haven't added that text, which can be skipped here imo

Comment thread modules/SpatialOmics/domains/spatial_panel.yaml Outdated
Replace PANEL_SYNAPSE_ID with HTAN_PANEL_ID using the standard HTAN
P-prefix pattern, consistent with SpatialPanel, MultiplexMicroscopyLevel2,
and ChannelMetadata.

Also fix PANEL_NAME description (was incorrectly copied from
PANEL_SIZE_TOTAL_TARGETS).
@github-actions github-actions Bot dismissed their stale review May 1, 2026 18:35

Superseded by updated review on commit 1d0abd4

github-actions[bot]
github-actions Bot previously approved these changes May 1, 2026
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HTAN Schema Review — ✅ Approved

Automated review by htan-claude · commit 1d0abd4756a56ef167de1772ba6a33442322dcb0

Solid PR overall — the conditional logic for TARGET_TYPE and PHYSICAL_SIZE_Z is well-structured, but there's one blocking provenance issue and several warnings worth addressing before merge.

Caution

1 blocking issue must be resolved before merge

Files Changed

  • modules/MultiplexMicroscopy/domains/level_2.yaml
  • modules/MultiplexMicroscopy/domains/multiplex_microscopy_channel_metadata.yaml
  • modules/SpatialOmics/domains/spatial_panel.yaml
  • modules/SpatialOmics/domains/level_3.yaml
  • modules/MultiplexMicroscopy/tests/test_multiplex_microscopy.py
  • modules/SpatialOmics/tests/test_spatial.py

Checklist Results

Check Result Notes
Inheritance correctness PASS No inheritance changes introduced
Inlining of nested objects PASS All new slots are range: string; no inlining required
Slot completeness (range, title, description) PASS All new/changed slots have range, title, and description
Enum integrity (alphabetical, descriptions) PASS TargetTypeEnum is alphabetical, all 7 values have descriptions
Generated artifacts N/A Generated .py and .json files excluded from diff per PR note

Findings

Blocking

  • modules/MultiplexMicroscopy/domains/level_2.yamlCHANNEL_METADATA_ID removed with no 1:1 replacement link to ChannelMetadata RecordSet (biological)

    The old CHANNEL_METADATA_ID (Synapse ID, ^syn\d+$ pattern) was the explicit foreign key pointing a single MultiplexMicroscopyLevel2 row at its exact ChannelMetadata RecordSet. Its replacement, HTAN_PANEL_ID, is a panel-level grouping key shared across many image rows and many channel rows — the join it creates is one-to-many, not one-to-one. After this change there is no unambiguous way to resolve which specific ChannelMetadata RecordSet belongs to a given image file record, breaking the provenance chain. Either retain CHANNEL_METADATA_ID (or an equivalent unique RecordSet pointer) alongside HTAN_PANEL_ID, or introduce a new slot such as CHANNEL_METADATA_RECORDSET_ID that uniquely identifies the ChannelMetadata RecordSet for each image.

Warnings

  • modules/SpatialOmics/domains/spatial_panel.yamlHuman Protein target type has no conditional rule for identifier slots (biological)

    TargetTypeEnum includes Human Protein alongside Human Gene and Human Transcript, but only the gene/transcript values have conditional rules requiring ENSEMBL_ID and HGNC_VERSION. Human protein targets used in antibody-based spatial proteomics (e.g., CODEX, MIBI) have stable identifiers (UniProt, HGNC protein IDs) and deserve the same treatment. As written, a Human Protein record needs only TARGET_NAME, leaving it as under-annotated as a Bacterial probe. Either add a rule requiring HGNC_VERSION (or a new UNIPROT_ID slot) when TARGET_TYPE is Human Protein, or add an explicit note in the slot description documenting the intentional omission.

  • modules/SpatialOmics/domains/spatial_panel.yamlBacterial and Viral target types have no conditional rule for taxonomic identifiers (biological)

    Bacterial and Viral are added as permissible TARGET_TYPE values, but unlike Other they do not trigger a requirement for OTHER_TARGET_DESCRIPTION or any equivalent taxonomic identifier (e.g., NCBI Taxonomy ID, GenBank accession). A bacterial or viral probe record could be submitted with only TARGET_NAME and no further annotation, which is insufficient for reproducibility in spatial metagenomics/viromics workflows. Consider extending the Other conditional rule to cover Bacterial and Viral, or adding a dedicated taxonomic identifier slot with appropriate conditionality.

  • modules/SpatialOmics/domains/spatial_panel.yaml / level_2.yaml / multiplex_microscopy_channel_metadata.yaml / level_3.yamlHTAN_PANEL_ID pattern duplicated across four files (biological + coverage)

    The regex pattern ^(?=.{1,50}$)(HTA2[0-2][0-9])_(0000|EXT[0-9]{1,18}|[0-9]{1,21})_(P[0-9]{1,20})$ is copy-pasted into all four files independently. If the pattern ever changes in one location it will silently diverge elsewhere. Consider defining HTAN_PANEL_ID once as a shared slot (e.g., in CoreFile or an appropriate base schema, consistent with how HTAN_DATA_FILE_ID is handled in core.yaml) and importing it into each module.

  • modules/MultiplexMicroscopy/tests/test_multiplex_microscopy.pytest_htan_panel_id_missing_level2_raises uses a hand-rolled validator instead of the dataclass constructor (coverage)

    The test for a missing HTAN_PANEL_ID in MultiplexMicroscopyLevel2 implements a custom validate_panel_id function rather than instantiating the generated MultiplexMicroscopyLevel2 dataclass. This validates schema metadata but does not confirm that the runtime dataclass enforces the constraint — unlike the parallel test_invalid_channel_metadata_missing_htan_panel_id which correctly uses the dataclass constructor. Add a test that calls MultiplexMicroscopyLevel2(...) without HTAN_PANEL_ID and asserts a ValueError is raised.

  • modules/SpatialOmics/tests/test_spatial.py — No valid-instance load test for SpatialPanel using the generated dataclass (coverage)

    test_spatial_panel_conditional_rules_instances uses a hand-rolled validator rather than loading a real SpatialPanel instance through the generated dataclass or a linkml-runtime loader. No test constructs an actual SpatialPanel object and verifies it parses cleanly end-to-end, which means enum range enforcement and pattern validation via linkml-validate are not exercised. Add at least one test that instantiates a SpatialPanel record via the generated dataclass or JSON schema validator.

Informational

  • modules/SpatialOmics/domains/spatial_panel.yaml — Schema version field absent despite breaking changes (structural + coverage)

    spatial_panel.yaml has no version: field (before or after this PR), yet the changes are substantial and breaking: three required slots removed, three new slots added, one slot renamed with broadened semantics, a new enum added, and three conditional rules introduced. The level_2.yaml and channel_metadata.yaml files correctly carry version bumps; spatial_panel.yaml should follow suit (e.g., version: "2.0.0"). No action required to merge, but worth tracking.

  • modules/SpatialOmics/domains/level_3.yamlPANEL_SYNAPSE_ID rename may affect already-submitted data (biological)

    Renaming PANEL_SYNAPSE_ID to HTAN_PANEL_ID is a breaking change for any SpatialLevel3 records previously submitted with the old slot name. If any HTAN centers have live submissions using this field, a migration script or backward-compatibility alias will be needed. This is a coordination concern rather than a schema correctness issue — flagged for awareness.

  • modules/SpatialOmics/domains/spatial_panel.yamlSpatialPanel has no provenance back-link to a biospecimen or assay record (biological)

    SpatialPanel holds HTAN_PANEL_ID as its identifier but carries no HTAN_PARENT_ID or HTAN_BIOSPECIMEN_ID. The join key is held on the assay side (SpatialLevel3.HTAN_PANEL_ID), which appears intentional — panel as a lookup table. If SpatialPanel is ever submitted as a standalone RecordSet without a corresponding SpatialLevel3 row, provenance would be severed. Worth confirming this join-from-assay direction is the authoritative design.

  • modules/MultiplexMicroscopy/domains/level_2.yaml / multiplex_microscopy_channel_metadata.yamlHTAN_PANEL_ID correctly omits identifier: true (coverage)

    In SpatialPanel, HTAN_PANEL_ID carries identifier: true. In MultiplexMicroscopyLevel2 and ChannelMetadata, the slot is a foreign-key reference (no identifier: true), which is correct. Noted for completeness — no action required.


Verdict

REQUEST_CHANGES


Rules defined in CLAUDE.md · To update review rules, edit CLAUDE.md and open a PR to main

Auto-generated Python classes from LinkML schema updates.

**Auto-generated by GitHub Actions workflow**
[skip ci]
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 1, 2026

Python classes auto-updated!

The Python classes have been automatically regenerated and committed to this PR branch.

**Updated files:**
```
modules/Biospecimen/src/htan_biospecimen/datamodel/biospecimen.py

modules/Clinical/src/htan_clinical/datamodel/clinical.py
modules/DigitalPathology/src/htan_digitalpathology/datamodel/digital_pathology.py
modules/Imaging/src/htan_imaging/datamodel/imaging.py
modules/MultiplexMicroscopy/src/htan_multiplexmicroscopy/datamodel/multiplex_microscopy.py
modules/Sequencing/src/htan_sequencing/datamodel/sequencing.py
modules/SpatialOmics/src/htan_spatial/datamodel/spatial.py
modules/WES/src/htan_wes/datamodel/wes.py
modules/scRNA-seq/src/htan_scrna_seq/datamodel/scrna_seq.py
```

The generated classes are now up to date with the schema changes.

@github-actions github-actions Bot dismissed their stale review May 1, 2026 18:46

Superseded by updated review on commit 9633a4b

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HTAN Schema Review — ✅ Approved

Automated review by htan-claude · commit 9633a4ba54309f20c3850d1859fe0140a8bea686

Solid PR with clear intent — there are a handful of warnings worth addressing before merge, but no blocking issues preventing it.

Warning

4 warnings to address (no blocking issues)

Files Changed

  • modules/SpatialOmics/domains/spatial_panel.yaml
  • modules/SpatialOmics/domains/level_3.yaml
  • modules/MultiplexMicroscopy/domains/level_2.yaml
  • modules/MultiplexMicroscopy/domains/multiplex_microscopy_channel_metadata.yaml
  • modules/SpatialOmics/tests/test_spatial.py
  • modules/MultiplexMicroscopy/tests/test_multiplex_microscopy.py

Checklist Results

Check Result Notes
Inheritance correctness PASS No changes to inheritance chains; existing chains unaffected
Inlining of nested objects PASS HTAN_PANEL_ID slots in level files use range: string, not range: SpatialPanel; no inlining violation
Slot completeness (range, title, description) PASS All new slots carry range, title, description, and pattern where applicable
Enum integrity (alphabetical, descriptions) PASS TargetTypeEnum values are alphabetically ordered; all 7 values have descriptions
Generated artifacts N/A Generated Python and JSON files excluded from diff per scope rules

Findings

Warnings

  • modules/SpatialOmics/domains/spatial_panel.yamlHuman Protein has no conditional identifier rule (structural + biological + coverage)
    TargetTypeEnum includes Human Protein (relevant for antibody-based spatial proteomics such as CODEX/PhenoCycler), but no rules block is triggered when TARGET_TYPE is Human Protein. This means only the unconditionally required TARGET_NAME string is enforced, with no stable identifier (e.g., HGNC symbol, UniProt accession, or Ensembl gene ID) required. If this is intentional — e.g., because TARGET_NAME is expected to follow a controlled vocabulary for proteins — please add a comment or annotations entry to the enum value or the schema documenting that decision, so future maintainers don't interpret it as an accidental omission. If it is not intentional, add a conditional rule requiring at least one identifier slot for Human Protein.

  • modules/SpatialOmics/domains/spatial_panel.yamlSpatialPanel has no biospecimen provenance anchor (biological)
    SpatialPanel (identified by HTAN_PANEL_ID) contains no HTAN_BIOSPECIMEN_ID, HTAN_PARENT_ID, or equivalent centre-level provenance slot. Without a provenance anchor it is impossible to trace which atlas centre or biospecimen collection a panel record belongs to across RecordSets, which is inconsistent with the HTAN cross-RecordSet traceability pattern. Consider whether at minimum an HTAN_CENTER_ID or HTAN_PARENT_BIOSPECIMEN_ID slot is appropriate here, or explicitly document in the schema why SpatialPanel is intentionally provenance-free (e.g., because panels are centre-level shared resources referenced by experiment-scoped records).

  • modules/SpatialOmics/tests/test_spatial.py — No generated-dataclass instance test for SpatialPanel (coverage)
    All SpatialPanel tests use SchemaView structural assertions or hand-rolled validators; none construct a SpatialPanel instance via the generated dataclass (e.g., from htan_spatialomics.datamodel.spatial_panel import SpatialPanel) to confirm a valid instance loads without error and an invalid one (e.g., missing TARGET_TYPE) raises the expected error. Per the coverage guidelines, at least one valid and one invalid generated-class instance test is required for each changed class.

  • modules/MultiplexMicroscopy/tests/test_multiplex_microscopy.py — No generated-dataclass valid-instance test for MultiplexMicroscopyLevel2 with HTAN_PANEL_ID (coverage)
    test_htan_panel_id_missing_level2_raises hand-rolls its own validation rather than exercising the generated MultiplexMicroscopyLevel2 dataclass. The ChannelMetadata tests correctly use the generated class, but MultiplexMicroscopyLevel2 does not have a corresponding valid-instance test confirming the new required HTAN_PANEL_ID slot is accepted end-to-end. Add at least one test that constructs a valid MultiplexMicroscopyLevel2 instance with HTAN_PANEL_ID present using the generated class.

Informational

  • modules/SpatialOmics/domains/spatial_panel.yaml — No organism/database identifier slots for Bacterial, Viral, or Control Probe target types (biological)
    For Bacterial and Viral target types, only TARGET_NAME (free text) is required — there are no slots for NCBI Taxonomy IDs, GenBank accessions, or pathogen database references. This may be intentional for an initial implementation supporting microbiome/viral spatial panels. No action required now, but worth tracking as a future enhancement if stricter provenance is needed for non-human targets.

  • modules/SpatialOmics/domains/spatial_panel.yaml — Schema version not bumped despite breaking changes (structural + coverage)
    The changes to spatial_panel.yaml are substantial: three fields removed (GENE_SYMBOL, GENE_ID, USER_GENE_NAME), multiple fields added, and a new enum introduced. If the schema header carries a version: key, this level of restructuring warrants an increment to signal a breaking change to downstream consumers. No action required if versioning is managed at the module or repository level rather than per-file.

  • PR description contains minor inaccuracies relative to the diff (structural)
    The PR description references slot names (OTHER_TARGET_TYPE, USER_GENE_NAME, GENE_SYMBOL, GENE_ID) and a limited enum set (Human Gene, Human Transcript, Other) that do not match the actual schema changes. The YAML itself is internally consistent — this is purely a documentation issue. No schema changes needed, but updating the PR description would help reviewers and historians of the change.

Verdict

APPROVE


Rules defined in CLAUDE.md · To update review rules, edit CLAUDE.md and open a PR to main

@aditigopalan
Copy link
Copy Markdown
Collaborator Author

Disregarding previous claude review, plan to keep this recordset going for all of HTAN Phase 2, similar to clinical and biospecimen.

Auto-generated Python classes from LinkML schema updates.

**Auto-generated by GitHub Actions workflow**
[skip ci]
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 1, 2026

Python classes auto-updated!

The Python classes have been automatically regenerated and committed to this PR branch.

**Updated files:**
```
modules/Biospecimen/src/htan_biospecimen/datamodel/biospecimen.py

modules/Clinical/src/htan_clinical/datamodel/clinical.py
modules/DigitalPathology/src/htan_digitalpathology/datamodel/digital_pathology.py
modules/Imaging/src/htan_imaging/datamodel/imaging.py
modules/MultiplexMicroscopy/src/htan_multiplexmicroscopy/datamodel/multiplex_microscopy.py
modules/Sequencing/src/htan_sequencing/datamodel/sequencing.py
modules/SpatialOmics/src/htan_spatial/datamodel/spatial.py
modules/WES/src/htan_wes/datamodel/wes.py
modules/scRNA-seq/src/htan_scrna_seq/datamodel/scrna_seq.py
```

The generated classes are now up to date with the schema changes.

@aditigopalan aditigopalan merged commit a98a83f into main May 1, 2026
@aditigopalan aditigopalan deleted the feat/issue-177-179-panel-id-and-target-type branch May 1, 2026 19:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

3 participants