Skip to content

feat: support omop emb 1.0.0#9

Merged
nicoloesch merged 46 commits into
mainfrom
8-support-omop-emb-0.5.0
May 25, 2026
Merged

feat: support omop emb 1.0.0#9
nicoloesch merged 46 commits into
mainfrom
8-support-omop-emb-0.5.0

Conversation

@nicoloesch
Copy link
Copy Markdown
Collaborator

@nicoloesch nicoloesch commented May 11, 2026

Motivation

omop-emb introduces a new storage and DB concept with significant and breaking changes to support a local-first and backend-agnostic storage solution for the embeddings. To include these changes and fixes that come with a successful PR, we need to prepare omop-graph for the new incoming interfaces etc.

Closes #8

To prevent additional versioning updates, this PR also absorbed the following issues:

@gkennos gkennos self-requested a review May 12, 2026 23:53
@gkennos
Copy link
Copy Markdown
Member

gkennos commented May 13, 2026

in README it mentions ClassIDEnum.HIERARCHICAL, but it should be ClassIDEnum.HIERARCHY

also stale refs: rank_paths, kg.find_shortest_paths, and kg.rank_paths

@gkennos
Copy link
Copy Markdown
Member

gkennos commented May 13, 2026

in paths.py:

reconstruct_paths makes all nodes standard=False --> path metadata is wrong

@gkennos
Copy link
Copy Markdown
Member

gkennos commented May 13, 2026

EdgeView.from_query() depends on positional column order rather than row names

@gkennos
Copy link
Copy Markdown
Member

gkennos commented May 13, 2026

if you add synonym (bool) field to LabelMatch and populate from KnowledgeGraph.concept_lookup(), then LabelMatchGroupView can faithfully return direct/synonym --> at the moment it's only returning exact/partial/fulltext/embedding

@gkennos
Copy link
Copy Markdown
Member

gkennos commented May 13, 2026

LabelMatchGroupView.from_matches assumes pre-sorting rather than explicit rule (or - stop using ms[0] as 'best' by definition)

@gkennos
Copy link
Copy Markdown
Member

gkennos commented May 13, 2026

in paths.py

should we rename 'num_hops' in find_standard_paths to max_standard_hops or something? this is very specifically only traversing standard-standard relationships

alternatively: could add in param allow_non_standard_intermediates: bool = False

reason is that for MCP exploration/view use case, users will likely expect broader graph navigation

@gkennos
Copy link
Copy Markdown
Member

gkennos commented May 13, 2026

find_standard_paths drops edges when multiple outgoing edges point to the same ID

this is an issue for concepts joined by >1 valid relationship

select * from (select concept_id_1, concept_id_2, count(distinct(relationship_id)) as c fr
om concept_relationship where concept_id_1 != concept_id_2 group by concept_id_1, concept_id_2) as d w
here c > 1 limit 10;
concept_id_1 | concept_id_2 | c
--------------+--------------+---
327 | 732315 | 2
8481 | 8600 | 2
8482 | 8569 | 2
8486 | 9222 | 2
8487 | 8545 | 2
8495 | 8548 | 2
8500 | 8609 | 2
8501 | 8611 | 2
8502 | 8610 | 2
8519 | 9540 | 2

  concept_views() returns one row per concept, while edges can contain multiple rows for the same object_id
  
  Build a lookup map by concept_id, then iterate all edges independently?

@gkennos
Copy link
Copy Markdown
Member

gkennos commented May 13, 2026

PathProfile.from_path() cannot handle valid empty-path case returned when source == target

Profiling or explaining a zero-hop path will fail at runtime instead of returning a sensible self-match profile

@gkennos
Copy link
Copy Markdown
Member

gkennos commented May 13, 2026

ranked_concepts = [
    _score_standard_concept(
        text=text,
        kg=kg,
        standard_concept=sc,
        num_ancestors=num_ancestors.get(sc.concept_id, 0),
        similarity_score=nearest_concept_matches_dict_for_single_query.get(sc.concept_id, None),
    )
    for sc in standard_concepts
]

_score_standard_concept just scores and does not rank --> confusing to downstream consumers

@gkennos
Copy link
Copy Markdown
Member

gkennos commented May 13, 2026

traverse()

  • During expansion, every discovered outgoing edge is appended to edges_out immediately.
  • Neighbor nodes are only added to visited later when popped from the queue.
  • If traversal terminates early because max_nodes is reached, some queued neighbor nodes may never become visited.
  • This breaks the usual expectation that a subgraph’s edge set is closed over its node set.

@gkennos
Copy link
Copy Markdown
Member

gkennos commented May 13, 2026

Scoring docs say relevance is composite, but _score_standard_concept() uses embedding similarity alone whenever it is available

I think this is intentional but perhaps having some kind of parameter to override this is worthwhile?

Comment thread src/omop_graph/graph/kg.py
Comment thread pyproject.toml Outdated
Comment thread src/omop_graph/cli.py
Comment thread src/omop_graph/cli.py
Comment thread src/omop_graph/oaklib_interface/omop_implementation.py Outdated
Comment thread src/omop_graph/graph/queries.py Outdated
Comment thread src/omop_graph/graph/queries.py
Comment thread src/omop_graph/graph/queries.py
Comment thread src/omop_graph/graph/kg.py Outdated
Comment thread src/omop_graph/graph/kg.py Outdated
@nicoloesch nicoloesch changed the title feat: support omop emb 0.5.0 feat: support omop emb 1.0.0 May 19, 2026
@nicoloesch nicoloesch marked this pull request as ready for review May 21, 2026 03:49
@gkennos gkennos self-requested a review May 21, 2026 03:49
Copy link
Copy Markdown
Member

@gkennos gkennos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! Confirmed all identified issues now closed with very minor outstanding issues - I am commenting rather than approving only because I think we need to release omop-alchemy 0.6.3 before we merge and release this update 👍

Comment thread src/omop_graph/cli.py
Comment thread docs/usage/cli.md
Comment thread src/omop_graph/cli.py
@gkennos gkennos self-requested a review May 25, 2026 04:57
@nicoloesch nicoloesch merged commit 8fa726b into main May 25, 2026
4 checks passed
github-actions Bot pushed a commit that referenced this pull request May 25, 2026
# [1.1.0](v1.0.4...v1.1.0) (2026-05-25)

### Features

* support omop emb 1.0.0 ([#9](#9)) ([8fa726b](8fa726b))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support orm-loader >0.4.0 Remove application-level lru_cache from KnowledgeGraph Support omop-emb v0.5.0

2 participants