Skip to content

Search with recompute second level latency for code RAG #177

@ASuresh0524

Description

@ASuresh0524

What happened?

Title: Search with recompute takes ~15s even after warmup

Environment

  • macOS (darwin 24.6.0)
  • Python 3.10.18
  • LEANN branch: feature/colqwen-integration

Steps to Reproduce

  1. Create a tiny code repo:

    mkdir -p /tmp/quick_test_code
    cat <<'PY' > /tmp/quick_test_code/test.py
    def hello():
        return "world"
    
    class Test:
        pass
    PY
  2. Build an index:

    leann build quick-test-index \
      --docs /tmp/quick_test_code \
      --use-ast-chunking \
      --embedding-model facebook/contriever \
      --embedding-mode sentence-transformers
  3. Run searches (measure wall time):

    time leann search quick-test-index "hello" --top-k 3
    time leann search quick-test-index "Test" --top-k 3
    time leann search quick-test-index "function" --top-k 3

Observed

  • Build completes (~15 s).
  • Each search takes 13–19 s, even after multiple runs.
  • Warm vs cold only differs by ~1.2 s, so most time is spent recomputing embeddings per query.

Expected

  • Searches with recompute=True may be slower, but not 13–19 s for a trivial index.
  • Would like to understand if there’s a way to avoid full model reload or make on-the-fly query embedding faster.

How to reproduce

Follow steps from above.

Error message

LEANN Version

0.1.0

Operating System

macOS

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinghelp wantedExtra attention is needed

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions