Symptom
The hybrid retrieval (BM25 + vector + RRF) is letting unrelated atoms outrank highly relevant ones when the query and a wrong-topic atom share a common term that dominates the lexical signal.
Concrete case
Query: `A newborn can focus on objects 8 to 12 inches away`
Top 5 results from `metis apply --top-k 3` (returned 5 due to a separate bug, see #N+1):
| Rank |
Source |
Content (truncated) |
| 1 |
aap-bright-futures-pocket |
"Pincer grasp means Picking up small objects with 2 fingers, a 12-month fine motor milestone" |
| 2 |
aap-bright-futures-pocket |
"pincer grasp means picking up small object with 2 fingers" |
| 3 |
bc-pediatric-nutrition-guidelines |
"infant between 8-12 months has the property: may prefer to feed self with fingers or spoon" |
| 4 |
toddler-s-first-steps |
"When starting when your toddler is about 9 months old, focus on…" |
| 5 |
baby-s-best-chance |
"new baby can ... briefly focus on things 18 to 45 cm (7 to 18 inches) away" ← actual answer |
The semantically-correct atom (about newborn vision focal distance) is ranked dead last, behind four atoms about pincer grasp and feeding that match on the surface tokens "8", "12", "focus", or "objects".
Hypothesis
The BM25 channel is over-weighting the literal token overlap between "8 to 12" / "objects" / "focus" and unrelated atoms whose surface text contains those tokens ("8-12 months", "small objects", "focus on"). The vector channel — which should rescue this — is either too low-weighted in RRF or the atom embedding for the correct unit isn't close enough to the query embedding.
Other observations from the same audit
Similar pattern on:
- `C7` (AOA recommends 6-12mo eye exam): toddler feeding atoms outrank an actual eye-exam atom about "first eye check at 6 months"
- `C9` (USPSTF 3-5 screening): general age-stage atoms rank above vision-screening-specific atoms
The pattern: when the query mentions a numeric range ("8 to 12", "3 to 5", "6 to 12"), atoms that happen to contain that same numeric range — for any topic — outrank topical matches.
Why it matters
For PGB integration (parentguidebook#109), editors querying Metis for citations on a specific claim need the right atom in the top 1–3. Burying it at rank 5+ defeats the workflow — editors will conclude "no atom exists" when in fact one does, just under noise.
Suggested investigation
- Inspect RRF weights. What's the current vector:lexical weight ratio? Per design docs, RRF is supposed to balance, but a single channel can dominate if its scores are stronger.
- Test embedding quality on this case. Pull the embedding for the correct atom and the query, compute cosine. If they're close, RRF tuning is the fix. If they're far, embedding model or the atom's content phrasing is the issue.
- Consider domain filtering as a re-ranker. Atoms tagged with domains like "vision", "infant development" should re-rank above atoms tagged "nutrition", "fine motor" when the query is clearly about vision. The frame-type registry already has this metadata.
- Numeric-range token bias. Specifically downweight token matches on isolated numbers and unit ranges ("8", "12", "6-12") which are noisy signals across a broad library.
Repro
```bash
cd ~/garage/metis/engine
export OPENAI_API_KEY=...
export KIMI_API_KEY=...
bun run src/cli.ts apply ~/garage/metis-library/parentguidebook \
"A newborn can focus on objects 8 to 12 inches away" \
--top-k 5 --format kx --output /tmp/repro.kx.json
jq '.units[] | {rank: .id, content: .content[0:80], source: .source.ref}' /tmp/repro.kx.json
```
The correct atom is in baby-s-best-chance, content starts with "new baby can tell light from dark, see shapes and patterns and briefly focus on things 18 to 45 cm".
Context
Found during the editorial audit for parentguidebook#109. The audit found 15/15 queries returned at least one on-topic atom somewhere in their top-K, but ranking was reliable enough for editorial use only on 4/15 queries.
Symptom
The hybrid retrieval (BM25 + vector + RRF) is letting unrelated atoms outrank highly relevant ones when the query and a wrong-topic atom share a common term that dominates the lexical signal.
Concrete case
Query: `A newborn can focus on objects 8 to 12 inches away`
Top 5 results from `metis apply --top-k 3` (returned 5 due to a separate bug, see #N+1):
The semantically-correct atom (about newborn vision focal distance) is ranked dead last, behind four atoms about pincer grasp and feeding that match on the surface tokens "8", "12", "focus", or "objects".
Hypothesis
The BM25 channel is over-weighting the literal token overlap between "8 to 12" / "objects" / "focus" and unrelated atoms whose surface text contains those tokens ("8-12 months", "small objects", "focus on"). The vector channel — which should rescue this — is either too low-weighted in RRF or the atom embedding for the correct unit isn't close enough to the query embedding.
Other observations from the same audit
Similar pattern on:
The pattern: when the query mentions a numeric range ("8 to 12", "3 to 5", "6 to 12"), atoms that happen to contain that same numeric range — for any topic — outrank topical matches.
Why it matters
For PGB integration (parentguidebook#109), editors querying Metis for citations on a specific claim need the right atom in the top 1–3. Burying it at rank 5+ defeats the workflow — editors will conclude "no atom exists" when in fact one does, just under noise.
Suggested investigation
Repro
```bash
cd ~/garage/metis/engine
export OPENAI_API_KEY=...
export KIMI_API_KEY=...
bun run src/cli.ts apply ~/garage/metis-library/parentguidebook \
"A newborn can focus on objects 8 to 12 inches away" \
--top-k 5 --format kx --output /tmp/repro.kx.json
jq '.units[] | {rank: .id, content: .content[0:80], source: .source.ref}' /tmp/repro.kx.json
```
The correct atom is in baby-s-best-chance, content starts with "new baby can tell light from dark, see shapes and patterns and briefly focus on things 18 to 45 cm".
Context
Found during the editorial audit for parentguidebook#109. The audit found 15/15 queries returned at least one on-topic atom somewhere in their top-K, but ranking was reliable enough for editorial use only on 4/15 queries.