Skip to content

Conversation

@robodev-r2d2
Copy link
Owner

Summary

  • move embedder settings and LangChain wrappers into rag-core-lib and re-export them from rag-core-api for reuse across services
  • update the admin dependency container to resolve semantic chunker embeddings through the shared embedder classes and make the semantic chunker wrapper easier to override
  • refresh the semantic chunker unit tests and README guidance to reflect the new configuration options

Testing

  • PYTHONPATH=src:../rag-core-lib/src poetry run pytest --override-ini=addopts= tests/semantic_text_chunker_test.py (fails: ModuleNotFoundError: No module named 'langchain_core')

https://chatgpt.com/codex/tasks/task_e_68f223af44b8832689386cfff962f6e7

@robodev-r2d2 robodev-r2d2 changed the title Centralize semantic chunker embeddings in rag-core-lib feat: centralize embeddings in rag-core-lib and add semantic chunker Oct 18, 2025
- Added optional max/min chunk size enforcement to SemanticTextChunker using RecursiveCharacterTextSplitter.
- Introduced new parameters: `breakpoint_threshold_amount`, `overlap`, and `recursive_text_splitter`.
- Implemented logic to rebalance chunks to meet minimum size requirements.
- Updated chunking logic to handle oversized chunks and ensure they are split appropriately.
- Enhanced documentation for clarity on new features and parameters.

fix: Ensure related metadata is unique in PageSummaryEnhancer

- Modified PageSummaryEnhancer to ensure the "related" metadata list contains unique IDs.

refactor: Update ChunkerSettings to reflect new chunking parameters

- Removed deprecated parameters and added `breakpoint_threshold_amount`, `buffer_size`, and `min_size`.
- Adjusted validation logic to accommodate changes in chunking strategy.

chore: Update dependencies and improve project structure

- Updated FastAPI, langchain, and other dependencies to their latest versions.
- Introduced ChunkerType enumeration for better chunker type management.
- Created ChunkerClassTypeSettings for environment-based configuration of chunker implementations.

test: Add comprehensive tests for chunking behavior

- Implemented tests to validate max/min chunk size enforcement and rebalance logic.
- Ensured existing tests are updated to reflect changes in parameter names and functionality.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants