fix for indexing issue with knowledge bases#274
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds an Alembic migration to create HNSW and GIN indexes; refactors vector search to use HNSW candidate CTEs with explicit vector casts and candidate limiting; tightens token updates to NULL-only; and changes image retrieval to require both clip and dino embeddings and run sequential retrieval. Changes
Sequence Diagram(s)mermaid Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
| # revision identifiers, used by Alembic. | ||
| revision: str = 'e8f2a1c3b5d9' | ||
| down_revision: Union[str, None] = 'c7a9e2f4b1d0' | ||
| branch_labels: Union[str, Sequence[str], None] = None |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@wavefront/server/modules/db_repo_module/db_repo_module/alembic/versions/2026_04_10_1000-e8f2a1c3b5d9_add_hnsw_index_on_embeddings.py`:
- Around line 21-69: The migration uses CREATE/DROP INDEX CONCURRENTLY inside
upgrade() and downgrade() via op.execute, which fails because CONCURRENTLY
cannot run inside a transaction; wrap each op.execute that creates or drops an
index in both upgrade() and downgrade() with an autocommit block using
op.get_context().autocommit_block() so each CREATE INDEX CONCURRENTLY / DROP
INDEX CONCURRENTLY runs outside the surrounding transaction (i.e., replace
direct op.execute(...) calls for ix_kbe_embedding_vector_hnsw_cosine,
ix_kbe_embedding_vector_hnsw_l2, ix_kbe_embedding_vector_1_hnsw_cosine, and
ix_kbe_token_gin with the same SQL executed inside with
op.get_context().autocommit_block():).
In
`@wavefront/server/modules/knowledge_base_module/knowledge_base_module/queries/generate_query.py`:
- Around line 92-123: The ANN candidate CTE hnsw_candidates is built across the
entire KnowledgeBaseEmbeddings table before applying the KB and metadata filters
in vector_results, so its LIMIT (:limit * 10) can drop all matches for the
requested KB; modify the query so the KB and metadata filters are applied inside
hnsw_candidates (e.g., join hnsw_candidates to KnowledgeBaseDocuments or
restrict by document_id IN (SELECT id FROM KnowledgeBaseDocuments WHERE
knowledge_base_id = :kb_id AND <metadata_filter_clause_inner>)) so the pre-LIMIT
candidate set is already scoped to the requested KB and
metadata_filter_clause_inner, then keep the subsequent vector_results CTE and
its final LIMIT :limit unchanged.
- Around line 98-103: The query uses raw columns (embedding_vector /
embedding_vector_1) in the distance expression and ORDER BY, so PostgreSQL can't
use the HNSW expression indexes; update the distance calculation and ORDER BY to
use the exact casted expressions used by the indexes (e.g.
(embedding_vector::vector(512)) and (embedding_vector_1::vector(1024))) wherever
KnowledgeBaseEmbeddings.embedding_vector or embedding_vector_1 are referenced in
generate_query.py so the planner can utilize the HNSW indexes and avoid seq
scans.
In
`@wavefront/server/modules/knowledge_base_module/knowledge_base_module/services/image_rag_retrieve.py`:
- Around line 47-59: The CLIP->DINO flow should not call image_retrieve_dino
when the CLIP stage produced no candidates; detect when clip_results is empty in
the block inside image_rag_retrieve (where clip_results is built) and
short-circuit (e.g., return an empty list/result or the appropriate no-match
response) instead of calling image_retrieve_dino, because image_retrieve_dino
(and get_image_embedding_dino) will drop the reference_ids filter when given an
empty reference_id_list and perform an unwanted full-KB DINO search.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: b5efc6c6-4532-43af-9115-3815f5e884fe
📒 Files selected for processing (3)
wavefront/server/modules/db_repo_module/db_repo_module/alembic/versions/2026_04_10_1000-e8f2a1c3b5d9_add_hnsw_index_on_embeddings.pywavefront/server/modules/knowledge_base_module/knowledge_base_module/queries/generate_query.pywavefront/server/modules/knowledge_base_module/knowledge_base_module/services/image_rag_retrieve.py
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
wavefront/server/modules/knowledge_base_module/knowledge_base_module/queries/generate_query.py (2)
252-272:⚠️ Potential issue | 🟠 MajorScale the DINO ANN window with the requested page.
hnsw_candidatesonly pulls:limit * 10rows, then the outer query appliesOFFSET :offset. Once users page deeper, valid candidates are dropped before the final sort.image_rag_retrieve.py:47-63forwardsoffset/limitinto this query, so this will surface as incomplete later pages.💡 Suggested change
effective_limit = limit if limit is not None else int(params.get('top_k', 10)) reference_id_list: List[Any] = params.get('reference_id_list', []) effective_offset = offset if offset is not None else 0 + candidate_limit = (effective_offset + effective_limit) * 10 ... params = { 'query_embedding': query_embeddings, 'kb_id': kb_id, - 'top_k': effective_limit, 'reference_ids': processed_reference_ids, 'offset': effective_offset, 'limit': effective_limit, + 'candidate_limit': candidate_limit, } ... - LIMIT :limit * 10 + LIMIT :candidate_limitAlso applies to: 290-322
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@wavefront/server/modules/knowledge_base_module/knowledge_base_module/queries/generate_query.py` around lines 252 - 272, The DINO ANN candidate window is fixed to ":limit * 10" causing valid results to be dropped when paginating; update generate_query.py to scale the ANN retrieval window by page so hnsw_candidates fetches enough rows for the requested offset. Concretely, compute an ann_window (e.g., ann_window = (effective_offset + effective_limit) * 10 or ann_window = (math.ceil((effective_offset + 1)/effective_limit) * effective_limit * 10)) using the existing effective_offset and effective_limit variables, pass that ann_window into the query parameters used by hnsw_candidates (instead of plain limit*10), and ensure the params dict (where 'top_k'/'reference_ids' are set) supplies this scaled window so deeper pages retain enough candidates before the outer OFFSET is applied.
53-70:⚠️ Potential issue | 🟠 MajorDon't page after trimming both rerank sources to one page.
vector_resultsandkeyword_resultsare capped at:limitbefore the finalOFFSET :offset, so page 2+ is computed from page 1 candidates only. The new ANN CTE has the same problem. At minimum, size those inner windows fromoffset + limitand oversample that value forhnsw_candidates.💡 Suggested change
effective_offset = offset or 0 + page_window = effective_offset + effective_limit # Prepare query parameters query_params = { 'query_embed': str(query_embeddings[0]), 'threshold': threshold, 'kb_id': kb_id, 'vector_weight': vector_weight, 'keyword_weight': keyword_weight, 'query': query, 'offset': effective_offset, 'limit': effective_limit, + 'page_window': page_window, + 'candidate_limit': page_window * 10, } ... - LIMIT :limit * 10 + LIMIT :candidate_limit ... - LIMIT :limit + LIMIT :page_window ... - LIMIT :limit + LIMIT :page_windowAlso applies to: 98-157
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@wavefront/server/modules/knowledge_base_module/knowledge_base_module/queries/generate_query.py` around lines 53 - 70, The current query trims vector_results and keyword_results to limit before applying OFFSET, causing page N>1 to only consider page 1 candidates; update the logic in generate_query (references: vector_results, keyword_results, hnsw_candidates and the ANN CTE) to request/compute inner windows sized at least offset + limit (and preferably oversample, e.g. multiply by a small factor) so the subsequent OFFSET/:offset and final LIMIT/:limit operate on a superset of candidates; adjust any places building query_params or CTE limits to use this expanded window size instead of just effective_limit.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In
`@wavefront/server/modules/knowledge_base_module/knowledge_base_module/queries/generate_query.py`:
- Around line 252-272: The DINO ANN candidate window is fixed to ":limit * 10"
causing valid results to be dropped when paginating; update generate_query.py to
scale the ANN retrieval window by page so hnsw_candidates fetches enough rows
for the requested offset. Concretely, compute an ann_window (e.g., ann_window =
(effective_offset + effective_limit) * 10 or ann_window =
(math.ceil((effective_offset + 1)/effective_limit) * effective_limit * 10))
using the existing effective_offset and effective_limit variables, pass that
ann_window into the query parameters used by hnsw_candidates (instead of plain
limit*10), and ensure the params dict (where 'top_k'/'reference_ids' are set)
supplies this scaled window so deeper pages retain enough candidates before the
outer OFFSET is applied.
- Around line 53-70: The current query trims vector_results and keyword_results
to limit before applying OFFSET, causing page N>1 to only consider page 1
candidates; update the logic in generate_query (references: vector_results,
keyword_results, hnsw_candidates and the ANN CTE) to request/compute inner
windows sized at least offset + limit (and preferably oversample, e.g. multiply
by a small factor) so the subsequent OFFSET/:offset and final LIMIT/:limit
operate on a superset of candidates; adjust any places building query_params or
CTE limits to use this expanded window size instead of just effective_limit.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 7e83758b-d51a-49dd-b1de-1e44c90a7ce5
📒 Files selected for processing (2)
wavefront/server/modules/knowledge_base_module/knowledge_base_module/queries/generate_query.pywavefront/server/modules/knowledge_base_module/knowledge_base_module/services/image_rag_retrieve.py
🚧 Files skipped from review as they are similar to previous changes (1)
- wavefront/server/modules/knowledge_base_module/knowledge_base_module/services/image_rag_retrieve.py
|
|
||
|
|
||
| # revision identifiers, used by Alembic. | ||
| revision: str = 'e8f2a1c3b5d9' |
| # revision identifiers, used by Alembic. | ||
| revision: str = 'e8f2a1c3b5d9' | ||
| down_revision: Union[str, None] = 'c7a9e2f4b1d0' | ||
| branch_labels: Union[str, Sequence[str], None] = None |
| revision: str = 'e8f2a1c3b5d9' | ||
| down_revision: Union[str, None] = 'c7a9e2f4b1d0' | ||
| branch_labels: Union[str, Sequence[str], None] = None | ||
| depends_on: Union[str, Sequence[str], None] = None |
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
wavefront/server/modules/knowledge_base_module/knowledge_base_module/queries/generate_query.py (1)
92-123:⚠️ Potential issue | 🟠 MajorReapply KB/metadata scoping inside
hnsw_candidates.This reintroduces the earlier multi-KB bug:
LIMIT :limit * 20is still taken from the whole embeddings table, so rows for the requested KB/filter can be dropped beforevector_resultsnarrows them.Possible fix
WITH hnsw_candidates AS ( SELECT id, document_id, chunk_text, chunk_index, (embedding_vector::vector(512)) <=> :query_embed ::vector(512) AS distance FROM {KnowledgeBaseEmbeddings.__tablename__} + WHERE + document_id IN ( + SELECT d.id + FROM {KnowledgeBaseDocuments.__tablename__} d + WHERE d.knowledge_base_id = :kb_id + {'AND (' + metadata_filter_clause_inner + ')' if metadata_filter_clause_inner else ''} + ) ORDER BY (embedding_vector::vector(512)) <=> :query_embed ::vector(512) LIMIT :limit * 20 ),🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@wavefront/server/modules/knowledge_base_module/knowledge_base_module/queries/generate_query.py` around lines 92 - 123, The hnsw_candidates CTE currently reads from KnowledgeBaseEmbeddings and applies LIMIT :limit * 20 before scoping by KB/metadata, causing relevant rows to be dropped; move the KB and metadata filtering and the join to KnowledgeBaseDocuments into the hnsw_candidates CTE (apply the same d.knowledge_base_id = :kb_id and the metadata_filter_clause_inner there and join on KnowledgeBaseDocuments.__tablename__), compute distance there, ORDER BY distance, and then LIMIT :limit * 20 so the prefiltered candidate set respects KB/metadata before vector_results selects the final :limit results.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@wavefront/server/modules/knowledge_base_module/knowledge_base_module/queries/generate_query.py`:
- Around line 196-224: The HNSW CTE (hnsw_candidates) in generate_query.py
builds a global ANN candidate set and applies the knowledge-base (KB) and
metadata filters only after LIMIT, which can truncate results incorrectly;
update the SQL generation so the WHERE clause (kb_id = :kb_id and the
metadata_filter_clause) is applied inside the hnsw_candidates CTE (i.e., add
"WHERE d.knowledge_base_id = :kb_id" and the metadata filter when selecting from
KnowledgeBaseEmbeddings or by joining KnowledgeBaseDocuments inside the CTE)
before the ORDER BY / LIMIT to ensure the top_k*20 candidates are scoped
correctly; also apply the same change to the analogous get_image_embedding()
query builder to fix the identical truncation bug.
- Around line 270-299: The hnsw_candidates CTE currently applies the d.id =
ANY(:reference_ids) filter only after LIMIT, causing referenced documents to be
excluded before reranking; modify the query in generate_query.py so the
reference_ids (when processed_reference_ids is truthy) and the
kb_id/metadata_filter_clause are applied inside the hnsw_candidates CTE (either
by joining KnowledgeBaseDocuments within the CTE or by filtering
KnowledgeBaseEmbeddings.document_id against :reference_ids and :kb_id) so the
LIMIT :limit * 20 is computed after narrowing candidates; update the CTE
selection logic that uses KnowledgeBaseEmbeddings, hnsw_candidates,
:reference_ids, :kb_id, and metadata_filter_clause accordingly.
---
Duplicate comments:
In
`@wavefront/server/modules/knowledge_base_module/knowledge_base_module/queries/generate_query.py`:
- Around line 92-123: The hnsw_candidates CTE currently reads from
KnowledgeBaseEmbeddings and applies LIMIT :limit * 20 before scoping by
KB/metadata, causing relevant rows to be dropped; move the KB and metadata
filtering and the join to KnowledgeBaseDocuments into the hnsw_candidates CTE
(apply the same d.knowledge_base_id = :kb_id and the
metadata_filter_clause_inner there and join on
KnowledgeBaseDocuments.__tablename__), compute distance there, ORDER BY
distance, and then LIMIT :limit * 20 so the prefiltered candidate set respects
KB/metadata before vector_results selects the final :limit results.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: c302abfe-8c2e-4e27-b501-bd636c4a3946
📒 Files selected for processing (2)
wavefront/server/modules/db_repo_module/db_repo_module/alembic/versions/2026_04_10_1000-e8f2a1c3b5d9_add_hnsw_index_on_embeddings.pywavefront/server/modules/knowledge_base_module/knowledge_base_module/queries/generate_query.py
🚧 Files skipped from review as they are similar to previous changes (1)
- wavefront/server/modules/db_repo_module/db_repo_module/alembic/versions/2026_04_10_1000-e8f2a1c3b5d9_add_hnsw_index_on_embeddings.py
| WITH hnsw_candidates AS ( | ||
| SELECT | ||
| e.id AS embedding_id, | ||
| e.chunk_text, | ||
| e.chunk_index, | ||
| d.id AS document_id, | ||
| d.file_path, | ||
| d.file_name, | ||
| d.knowledge_base_id, | ||
| d.metadata_value, | ||
| e.embedding_vector <-> :query_embedding ::vector AS distance | ||
| id, | ||
| document_id, | ||
| chunk_text, | ||
| chunk_index, | ||
| (embedding_vector::vector(512)) <=> :query_embedding ::vector(512) AS distance | ||
| FROM | ||
| {KnowledgeBaseEmbeddings.__tablename__} e | ||
| JOIN | ||
| {KnowledgeBaseDocuments.__tablename__} d ON e.document_id = d.id | ||
| WHERE | ||
| d.knowledge_base_id = :kb_id {'AND (' + metadata_filter_clause_final + ')' if metadata_filter_clause_final else ''} | ||
| ORDER BY distance ASC | ||
| {KnowledgeBaseEmbeddings.__tablename__} | ||
| ORDER BY | ||
| (embedding_vector::vector(512)) <=> :query_embedding ::vector(512) | ||
| LIMIT :top_k * 20 | ||
| ) | ||
| SELECT | ||
| * | ||
| FROM | ||
| ranked_embeddings | ||
| hc.id AS embedding_id, | ||
| hc.chunk_text, | ||
| hc.chunk_index, | ||
| d.id AS document_id, | ||
| d.file_path, | ||
| d.file_name, | ||
| d.knowledge_base_id, | ||
| d.metadata_value, | ||
| hc.distance | ||
| FROM hnsw_candidates hc | ||
| JOIN {KnowledgeBaseDocuments.__tablename__} d ON hc.document_id = d.id | ||
| WHERE d.knowledge_base_id = :kb_id | ||
| {'AND (' + metadata_filter_clause + ')' if metadata_filter_clause else ''} | ||
| ORDER BY hc.distance ASC | ||
| LIMIT :top_k |
There was a problem hiding this comment.
Scope the CLIP ANN candidate set before LIMIT.
get_image_embedding() has the same truncation problem: the HNSW CTE is global, and the KB/metadata filters only run after LIMIT :top_k * 20. In a multi-KB or selective-filter dataset this can silently return too few results.
Possible fix
WITH hnsw_candidates AS (
SELECT
id,
document_id,
chunk_text,
chunk_index,
(embedding_vector::vector(512)) <=> :query_embedding ::vector(512) AS distance
FROM
{KnowledgeBaseEmbeddings.__tablename__}
+ WHERE
+ document_id IN (
+ SELECT d.id
+ FROM {KnowledgeBaseDocuments.__tablename__} d
+ WHERE d.knowledge_base_id = :kb_id
+ {'AND (' + metadata_filter_clause + ')' if metadata_filter_clause else ''}
+ )
ORDER BY
(embedding_vector::vector(512)) <=> :query_embedding ::vector(512)
LIMIT :top_k * 20
)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@wavefront/server/modules/knowledge_base_module/knowledge_base_module/queries/generate_query.py`
around lines 196 - 224, The HNSW CTE (hnsw_candidates) in generate_query.py
builds a global ANN candidate set and applies the knowledge-base (KB) and
metadata filters only after LIMIT, which can truncate results incorrectly;
update the SQL generation so the WHERE clause (kb_id = :kb_id and the
metadata_filter_clause) is applied inside the hnsw_candidates CTE (i.e., add
"WHERE d.knowledge_base_id = :kb_id" and the metadata filter when selecting from
KnowledgeBaseEmbeddings or by joining KnowledgeBaseDocuments inside the CTE)
before the ORDER BY / LIMIT to ensure the top_k*20 candidates are scoped
correctly; also apply the same change to the analogous get_image_embedding()
query builder to fix the identical truncation bug.
| WITH hnsw_candidates AS ( | ||
| SELECT | ||
| e.id AS embedding_id, | ||
| e.chunk_text, | ||
| e.chunk_index, | ||
| d.id AS document_id, | ||
| d.file_path, | ||
| d.file_name, | ||
| d.knowledge_base_id, | ||
| d.metadata_value, | ||
| (1 - (e.embedding_vector_1 <=> :query_embedding ::vector)) AS similarity | ||
| FROM {KnowledgeBaseEmbeddings.__tablename__} e | ||
| JOIN {KnowledgeBaseDocuments.__tablename__} d ON e.document_id = d.id | ||
| WHERE | ||
| d.knowledge_base_id = :kb_id {reference_filter} {'AND (' + metadata_filter_clause_final + ')' if metadata_filter_clause_final else ''} | ||
| ORDER BY similarity DESC | ||
| id, | ||
| document_id, | ||
| chunk_text, | ||
| chunk_index, | ||
| (embedding_vector_1::vector(1024)) <=> :query_embedding ::vector(1024) AS distance | ||
| FROM | ||
| {KnowledgeBaseEmbeddings.__tablename__} | ||
| ORDER BY | ||
| (embedding_vector_1::vector(1024)) <=> :query_embedding ::vector(1024) | ||
| LIMIT :limit * 20 | ||
| ) | ||
| SELECT | ||
| * | ||
| FROM | ||
| ranked_embeddings | ||
| hc.id AS embedding_id, | ||
| hc.chunk_text, | ||
| hc.chunk_index, | ||
| d.id AS document_id, | ||
| d.file_path, | ||
| d.file_name, | ||
| d.knowledge_base_id, | ||
| d.metadata_value, | ||
| 1 - hc.distance AS similarity | ||
| FROM hnsw_candidates hc | ||
| JOIN {KnowledgeBaseDocuments.__tablename__} d ON hc.document_id = d.id | ||
| WHERE d.knowledge_base_id = :kb_id | ||
| {('AND d.id = ANY(:reference_ids)' if processed_reference_ids else '')} | ||
| {'AND (' + metadata_filter_clause + ')' if metadata_filter_clause else ''} | ||
| ORDER BY similarity DESC | ||
| LIMIT :limit OFFSET :offset |
There was a problem hiding this comment.
Filter reference_ids inside the DINO candidate CTE.
image_rag_retrieve.py uses this query to rerank the CLIP-selected reference_id_list, but d.id = ANY(:reference_ids) is applied only after LIMIT :limit * 20. That means the global DINO top-N can exclude every referenced document and return empty/incomplete results even when CLIP found good matches.
Possible fix
WITH hnsw_candidates AS (
SELECT
id,
document_id,
chunk_text,
chunk_index,
(embedding_vector_1::vector(1024)) <=> :query_embedding ::vector(1024) AS distance
FROM
{KnowledgeBaseEmbeddings.__tablename__}
+ WHERE
+ document_id IN (
+ SELECT d.id
+ FROM {KnowledgeBaseDocuments.__tablename__} d
+ WHERE d.knowledge_base_id = :kb_id
+ {('AND d.id = ANY(:reference_ids)' if processed_reference_ids else '')}
+ {'AND (' + metadata_filter_clause + ')' if metadata_filter_clause else ''}
+ )
ORDER BY
(embedding_vector_1::vector(1024)) <=> :query_embedding ::vector(1024)
LIMIT :limit * 20
)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@wavefront/server/modules/knowledge_base_module/knowledge_base_module/queries/generate_query.py`
around lines 270 - 299, The hnsw_candidates CTE currently applies the d.id =
ANY(:reference_ids) filter only after LIMIT, causing referenced documents to be
excluded before reranking; modify the query in generate_query.py so the
reference_ids (when processed_reference_ids is truthy) and the
kb_id/metadata_filter_clause are applied inside the hnsw_candidates CTE (either
by joining KnowledgeBaseDocuments within the CTE or by filtering
KnowledgeBaseEmbeddings.document_id against :reference_ids and :kb_id) so the
LIMIT :limit * 20 is computed after narrowing candidates; update the CTE
selection logic that uses KnowledgeBaseEmbeddings, hnsw_candidates,
:reference_ids, :kb_id, and metadata_filter_clause accordingly.
* fix for migration * fix for migration * fix for migration * fix for migration * fix for embedding index creation * fix migration commit issue * fix migration commit issue
Summary by CodeRabbit
Performance Improvements
Bug Fixes
Improvements