apply: poor ranking on borderline lexical matches — relevant atoms buried under unrelated ones sharing a common term

## Symptom

The hybrid retrieval (BM25 + vector + RRF) is letting unrelated atoms outrank highly relevant ones when the query and a wrong-topic atom share a common term that dominates the lexical signal.

## Concrete case

Query: \`A newborn can focus on objects 8 to 12 inches away\`

Top 5 results from \`metis apply --top-k 3\` (returned 5 due to a separate bug, see #N+1):

| Rank | Source | Content (truncated) |
|---|---|---|
| 1 | aap-bright-futures-pocket | \"Pincer grasp means Picking up small objects with 2 fingers, **a 12-month** fine motor milestone\" |
| 2 | aap-bright-futures-pocket | \"pincer grasp means picking up small object with 2 fingers\" |
| 3 | bc-pediatric-nutrition-guidelines | \"infant between **8-12 months** has the property: may prefer to feed self with fingers or spoon\" |
| 4 | toddler-s-first-steps | \"When starting when your toddler is about 9 months old, focus on…\" |
| **5** | **baby-s-best-chance** | **\"new baby can ... briefly focus on things 18 to 45 cm (7 to 18 inches) away\"** ← actual answer |

The semantically-correct atom (about newborn vision focal distance) is ranked **dead last**, behind four atoms about pincer grasp and feeding that match on the surface tokens \"8\", \"12\", \"focus\", or \"objects\".

## Hypothesis

The BM25 channel is over-weighting the literal token overlap between \"8 to 12\" / \"objects\" / \"focus\" and unrelated atoms whose surface text contains those tokens (\"8-12 months\", \"small objects\", \"focus on\"). The vector channel — which should rescue this — is either too low-weighted in RRF or the atom embedding for the correct unit isn't close enough to the query embedding.

## Other observations from the same audit

Similar pattern on:
- \`C7\` (AOA recommends 6-12mo eye exam): toddler feeding atoms outrank an actual eye-exam atom about \"first eye check at 6 months\"
- \`C9\` (USPSTF 3-5 screening): general age-stage atoms rank above vision-screening-specific atoms

The pattern: when the query mentions a numeric range (\"8 to 12\", \"3 to 5\", \"6 to 12\"), atoms that happen to contain that same numeric range — for any topic — outrank topical matches.

## Why it matters

For PGB integration ([parentguidebook#109](https://github.com/yangyang-how/parentguidebook/issues/109)), editors querying Metis for citations on a specific claim need the right atom in the top 1–3. Burying it at rank 5+ defeats the workflow — editors will conclude \"no atom exists\" when in fact one does, just under noise.

## Suggested investigation

1. **Inspect RRF weights.** What's the current vector:lexical weight ratio? Per design docs, RRF is supposed to balance, but a single channel can dominate if its scores are stronger.
2. **Test embedding quality on this case.** Pull the embedding for the correct atom and the query, compute cosine. If they're close, RRF tuning is the fix. If they're far, embedding model or the atom's content phrasing is the issue.
3. **Consider domain filtering as a re-ranker.** Atoms tagged with domains like \"vision\", \"infant development\" should re-rank above atoms tagged \"nutrition\", \"fine motor\" when the query is clearly about vision. The frame-type registry already has this metadata.
4. **Numeric-range token bias.** Specifically downweight token matches on isolated numbers and unit ranges (\"8\", \"12\", \"6-12\") which are noisy signals across a broad library.

## Repro

\`\`\`bash
cd ~/garage/metis/engine
export OPENAI_API_KEY=...
export KIMI_API_KEY=...
bun run src/cli.ts apply ~/garage/metis-library/parentguidebook \\
  \"A newborn can focus on objects 8 to 12 inches away\" \\
  --top-k 5 --format kx --output /tmp/repro.kx.json
jq '.units[] | {rank: .id, content: .content[0:80], source: .source.ref}' /tmp/repro.kx.json
\`\`\`

The correct atom is in baby-s-best-chance, content starts with \"new baby can tell light from dark, see shapes and patterns and briefly focus on things 18 to 45 cm\".

## Context

Found during the editorial audit for parentguidebook#109. The audit found 15/15 queries returned at least one on-topic atom *somewhere* in their top-K, but ranking was reliable enough for editorial use only on 4/15 queries.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

apply: poor ranking on borderline lexical matches — relevant atoms buried under unrelated ones sharing a common term #22

Symptom

Concrete case

Hypothesis

Other observations from the same audit

Why it matters

Suggested investigation

Repro

Context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Rank	Source	Content (truncated)
1	aap-bright-futures-pocket	"Pincer grasp means Picking up small objects with 2 fingers, a 12-month fine motor milestone"
2	aap-bright-futures-pocket	"pincer grasp means picking up small object with 2 fingers"
3	bc-pediatric-nutrition-guidelines	"infant between 8-12 months has the property: may prefer to feed self with fingers or spoon"
4	toddler-s-first-steps	"When starting when your toddler is about 9 months old, focus on…"
5	baby-s-best-chance	"new baby can ... briefly focus on things 18 to 45 cm (7 to 18 inches) away" ← actual answer

apply: poor ranking on borderline lexical matches — relevant atoms buried under unrelated ones sharing a common term #22

Description

Symptom

Concrete case

Hypothesis

Other observations from the same audit

Why it matters

Suggested investigation

Repro

Context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions