Skip to content

apply: poor ranking on borderline lexical matches — relevant atoms buried under unrelated ones sharing a common term #22

@0b00101111

Description

@0b00101111

Symptom

The hybrid retrieval (BM25 + vector + RRF) is letting unrelated atoms outrank highly relevant ones when the query and a wrong-topic atom share a common term that dominates the lexical signal.

Concrete case

Query: `A newborn can focus on objects 8 to 12 inches away`

Top 5 results from `metis apply --top-k 3` (returned 5 due to a separate bug, see #N+1):

Rank Source Content (truncated)
1 aap-bright-futures-pocket "Pincer grasp means Picking up small objects with 2 fingers, a 12-month fine motor milestone"
2 aap-bright-futures-pocket "pincer grasp means picking up small object with 2 fingers"
3 bc-pediatric-nutrition-guidelines "infant between 8-12 months has the property: may prefer to feed self with fingers or spoon"
4 toddler-s-first-steps "When starting when your toddler is about 9 months old, focus on…"
5 baby-s-best-chance "new baby can ... briefly focus on things 18 to 45 cm (7 to 18 inches) away" ← actual answer

The semantically-correct atom (about newborn vision focal distance) is ranked dead last, behind four atoms about pincer grasp and feeding that match on the surface tokens "8", "12", "focus", or "objects".

Hypothesis

The BM25 channel is over-weighting the literal token overlap between "8 to 12" / "objects" / "focus" and unrelated atoms whose surface text contains those tokens ("8-12 months", "small objects", "focus on"). The vector channel — which should rescue this — is either too low-weighted in RRF or the atom embedding for the correct unit isn't close enough to the query embedding.

Other observations from the same audit

Similar pattern on:

  • `C7` (AOA recommends 6-12mo eye exam): toddler feeding atoms outrank an actual eye-exam atom about "first eye check at 6 months"
  • `C9` (USPSTF 3-5 screening): general age-stage atoms rank above vision-screening-specific atoms

The pattern: when the query mentions a numeric range ("8 to 12", "3 to 5", "6 to 12"), atoms that happen to contain that same numeric range — for any topic — outrank topical matches.

Why it matters

For PGB integration (parentguidebook#109), editors querying Metis for citations on a specific claim need the right atom in the top 1–3. Burying it at rank 5+ defeats the workflow — editors will conclude "no atom exists" when in fact one does, just under noise.

Suggested investigation

  1. Inspect RRF weights. What's the current vector:lexical weight ratio? Per design docs, RRF is supposed to balance, but a single channel can dominate if its scores are stronger.
  2. Test embedding quality on this case. Pull the embedding for the correct atom and the query, compute cosine. If they're close, RRF tuning is the fix. If they're far, embedding model or the atom's content phrasing is the issue.
  3. Consider domain filtering as a re-ranker. Atoms tagged with domains like "vision", "infant development" should re-rank above atoms tagged "nutrition", "fine motor" when the query is clearly about vision. The frame-type registry already has this metadata.
  4. Numeric-range token bias. Specifically downweight token matches on isolated numbers and unit ranges ("8", "12", "6-12") which are noisy signals across a broad library.

Repro

```bash
cd ~/garage/metis/engine
export OPENAI_API_KEY=...
export KIMI_API_KEY=...
bun run src/cli.ts apply ~/garage/metis-library/parentguidebook \
"A newborn can focus on objects 8 to 12 inches away" \
--top-k 5 --format kx --output /tmp/repro.kx.json
jq '.units[] | {rank: .id, content: .content[0:80], source: .source.ref}' /tmp/repro.kx.json
```

The correct atom is in baby-s-best-chance, content starts with "new baby can tell light from dark, see shapes and patterns and briefly focus on things 18 to 45 cm".

Context

Found during the editorial audit for parentguidebook#109. The audit found 15/15 queries returned at least one on-topic atom somewhere in their top-K, but ranking was reliable enough for editorial use only on 4/15 queries.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions