Skip to content

Conversation

@majiayu000
Copy link

Summary

  • Fixes the issue where leann list scans all files in large directories like $HOME
  • Adds _find_meta_files_limited() method with configurable max_depth (default: 3)
  • Skips common large directories (node_modules, .venv, .git, pycache, etc.)
  • Applies limited search in both CLI and registry modules

Fixes #122

Test plan

  • Added 8 new tests in tests/test_cli_list_performance.py
  • Tests verify depth limiting works correctly
  • Tests verify skip directories are properly excluded
  • All new tests pass locally

Fixes yichuan-w#122

The `leann list` command was scanning entire directory trees using
`rglob()`, causing extremely slow performance when run in large
directories like $HOME.

Changes:
- Add `_find_meta_files_limited()` method with max_depth parameter
- Skip common large directories (node_modules, .venv, .git, etc.)
- Apply limited search in `_discover_indexes_in_project()` and
  `_find_all_matching_indexes()`
- Add `_has_app_indexes_limited()` in registry.py for faster checks
- Add comprehensive tests for the new functionality

Signed-off-by: majiayu000 <[email protected]>
@andylizf
Copy link
Collaborator

Thanks for working on this! A few suggestions:

1. Add --max-depth CLI option

Instead of hardcoding max_depth=3, consider adding a CLI option with a reasonable default:

leann list --max-depth 5

3 levels might be too shallow for some project structures (e.g., project/data/experiments/v1/index.leann would be missed).

2. Consider a design improvement: Full registry instead of scanning

Currently the code scans for "Apps format" indexes (*.leann.meta.json) because they can be anywhere in the project. But this is the root cause of the performance issue.

A better approach might be to register all indexes (both CLI and App-created) in the global registry with their full paths:

// ~/.leann/indexes.json
{
  "my-docs": "/path/to/project/.leann/indexes/my-docs",
  "app-index": "/path/to/project/data/app-index.leann"
}

This way leann list becomes O(1) - just read the registry, no scanning needed at all.

This could be a follow-up issue/PR, but worth considering as the long-term solution.

Address reviewer feedback by making the directory scan depth configurable
instead of hardcoding it to 3. Users with deeply nested project structures
can now increase the depth limit as needed.

- Add --max-depth argument to list command (default: 3)
- Update list_indexes() and _discover_indexes_in_project() to accept max_depth
- Add tests for the new CLI option and custom depth behavior

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@majiayu000
Copy link
Author

Thanks for the feedback @andylizf!

I've added the --max-depth CLI option as suggested. Users can now customize the scan depth:

# Default depth (3)
leann list

# Scan deeper directories
leann list --max-depth 5

For the global registry suggestion (making leann list O(1)), I agree it's a great long-term improvement. I'll create a follow-up issue to track that enhancement.

@majiayu000
Copy link
Author

Update: I've also implemented the long-term global registry solution in a separate PR #199.

This PR can be merged independently. PR #199 builds on top of this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

leann list scans all the things

2 participants