Skip to content

Conversation

@mark-hingston
Copy link

What does this PR do?

Summary

  • Adds a new leann update CLI command that allows users to incrementally add new documents to existing HNSW indices without rebuilding from scratch
  • Includes validation to ensure only non-compact HNSW indices can be updated, with clear error messages guiding users when prerequisites aren't met

Changes

  • CLI Command: New update subcommand with support for multiple document paths, chunking options, file type filters, and AST chunking
  • Validation: Checks that target index exists, uses HNSW backend, and is non-compact before allowing updates
  • Metadata Preservation: Reads existing index configuration (embedding model, graph parameters, chunking settings) to ensure consistency
  • Documentation: Updated README with complete update command reference and examples
  • Tests: Added comprehensive test suite (test_cli_update.py) covering argument parsing, chunking options, file filters, and default values

Usage Example

Add new documents to existing index

leann update my-docs --docs ./new-documents

Add with custom chunking

leann update my-code --docs ./new-src --file-types .py,.js

Notes

  • Only works with HNSW indices built with --no-compact flag
  • Preserves original index configuration for consistency
  • Provides helpful error messages when update isn't possible

Related Issues

Fixes #

Checklist

  • Tests pass (uv run pytest)
  • Code formatted (ruff format and ruff check)
  • Pre-commit hooks pass (pre-commit run --all-files)

@yichuan-w
Copy link
Owner

Thanks, it is a great PR we will review it later. BTW, is that for recompute or not?
I guess it uses our update/add API, which we have not tested in a production-wide range

@yichuan-w
Copy link
Owner

I guess there might be some risk in merging this, because this feature is still in meta-version. Have you tried on a real workload, and I am not sure if that is actually work?

@mark-hingston
Copy link
Author

I guess there might be some risk in merging this, because this feature is still in meta-version. Have you tried on a real workload, and I am not sure if that is actually work?

Thanks for the review. Yes, the update command respects the original index's is_recompute setting - it reads it from metadata and passes it to the builder.

I haven't tested a real/prod workload.

@ASuresh0524
Copy link
Collaborator

@mark-hingston Would you be able to test it and let us know how it works, would be good to see the workflow run

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants