Problem / motivation
The KB ingest (POST /kb/ingest) uses a fixed 800-character chunker with no overlap and no awareness of sentence/paragraph boundaries. This can split mid-sentence, degrading to_tsvector full-text search relevance because PostgreSQL stems incomplete fragments.
Proposed solution
Add a configurable chunk overlap (e.g., 100–200 chars) so context spans chunk boundaries
Split on paragraph/sentence boundaries instead of hard character offsets (recursive chunking)
Future: Consider adding pgvector or leveraging the existing Qdrant instance for semantic vector search on KB documents (the services/threatintel service already uses Qdrant + BAAI/bge-small-en-v1.5)
Alternatives considered
No response
Component area
Other
Checklist
Problem / motivation
The KB ingest (POST /kb/ingest) uses a fixed 800-character chunker with no overlap and no awareness of sentence/paragraph boundaries. This can split mid-sentence, degrading to_tsvector full-text search relevance because PostgreSQL stems incomplete fragments.
Proposed solution
Add a configurable chunk overlap (e.g., 100–200 chars) so context spans chunk boundaries
Split on paragraph/sentence boundaries instead of hard character offsets (recursive chunking)
Future: Consider adding pgvector or leveraging the existing Qdrant instance for semantic vector search on KB documents (the services/threatintel service already uses Qdrant + BAAI/bge-small-en-v1.5)
Alternatives considered
No response
Component area
Other
Checklist