feat: Optimize similarity search with vectorized cosine similarity (#634) by fennhelloworld · Pull Request #648 · ritesh-1918/HELPDESK.AI

fennhelloworld · 2026-05-29T19:29:26Z

Summary

Closes #634 — Optimizes the duplicate detection similarity search by replacing the per-ticket loop with vectorized batched cosine similarity.

Problem

DuplicateService.check_duplicate() previously iterated over every stored ticket embedding and called util.cos_sim() individually, resulting in O(n) separate tensor operations and kernel launches. Under load with many cached tickets, this caused significant latency.

Solution

All stored embeddings are now stacked into a single 2D tensor (_embedding_matrix) and compared against the query embedding in one batched matrix operation, then torch.argmax() identifies the best match.

Key changes

File	Change
`backend/services/duplicate_service.py`	Vectorized `check_duplicate()`, added `_rebuild_embedding_matrix()`, lazy matrix caching
`backend/services/benchmark_similarity.py`	New benchmark script comparing loop vs vectorized performance

Benchmark results

Tickets	Loop (ms)	Vectorized (ms)	Speedup
10	0.70	0.07	10x
100	2.90	0.09	33x
500	14.43	0.07	196x
1,000	29.52	0.07	394x
5,000	144.16	0.34	421x

Implementation details

Lazy rebuild: The embedding matrix is only rebuilt when _embedding_matrix_dirty is True (after add_ticket()), avoiding redundant computation.
Backward compatible: The public API (check_duplicate(), add_ticket(), is_available(), load()) is unchanged — same inputs, same outputs.
No new dependencies: Uses existing torch and sentence_transformers.util already in the project.

How to test

# Run the benchmark
python backend/services/benchmark_similarity.py

Checklist

Code follows existing style and conventions
No breaking changes to public API
Benchmark demonstrates measurable improvement
Closes [BOUNTY] [level:critical] Vectorize Sentence-Transformers Cosine Similarity Computations with NumPy and ONNX Runtime #634

Summary by CodeRabbit

Performance
- Improved duplicate detection: uses batched/vectorized similarity checks with caching to significantly boost throughput for large ticket volumes.
Chores
- Added an internal benchmarking tool to measure and compare duplicate-detection performance at scale.

…itesh-1918#634) Replace per-ticket loop in DuplicateService.check_duplicate() with vectorized batched cosine similarity computation. Instead of calling util.cos_sim() individually for each stored embedding (O(n) kernel launches), all stored embeddings are stacked into a single 2D tensor and compared against the query in one matrix operation. Key changes: - Add _embedding_matrix, _ticket_ids, and _embedding_matrix_dirty to DuplicateService for lazy-rebuild caching - Add _rebuild_embedding_matrix() to stack embeddings into 2D tensor - Rewrite check_duplicate() to use vectorized util.cos_sim() with the stacked matrix and torch.argmax() for best-match selection - Mark matrix dirty on add_ticket() for correctness - Add benchmark_similarity.py showing speedup results: n=10: 10x, n=100: 33x, n=500: 196x, n=1000: 394x, n=5000: 421x Closes ritesh-1918#634

vercel · 2026-05-29T19:29:31Z

Someone is attempting to deploy a commit to the ritesh Team on Vercel.

A member of the Team first needs to authorize it.

coderabbitai · 2026-05-29T19:29:41Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f24d6cd0-e1b7-4cb6-8209-637a668db2c9

📥 Commits

Reviewing files that changed from the base of the PR and between 35a9990 and 963a2da.

📒 Files selected for processing (2)

backend/services/benchmark_similarity.py
backend/services/duplicate_service.py

🚧 Files skipped from review as they are similar to previous changes (2)

backend/services/duplicate_service.py
backend/services/benchmark_similarity.py

📝 Walkthrough

Walkthrough

DuplicateService now caches a stacked embedding matrix and performs batched cosine-similarity lookups (argmax) instead of per-ticket loops. A new benchmark script generates synthetic unit-normalized embeddings and measures loop vs. vectorized timings across multiple ticket counts, reporting average durations and speedup.

Changes

Vectorized Duplicate Detection

Layer / File(s)	Summary
DuplicateService vectorized implementation `backend/services/duplicate_service.py`	Adds `torch`/`numpy` imports, caches `_embedding_matrix` and `_ticket_ids` with `_embedding_matrix_dirty`, implements `_rebuild_embedding_matrix()` to stack embeddings, marks cache dirty in `add_ticket()`, and replaces per-ticket similarity loop in `check_duplicate()` with batched cosine-similarity and `torch.argmax`.
Benchmark comparison script `backend/services/benchmark_similarity.py`	Adds `EMBEDDING_DIM`, `_generate_synthetic_embeddings` to create unit-normalized random embeddings, `benchmark_loop` (per-embedding loop timing), `benchmark_vectorized` (batched timing), and `main()` to run experiments over multiple dataset sizes and print average timings and speedups.

Sequence Diagram(s)

sequenceDiagram
  participant check_duplicate as check_duplicate()
  participant rebuild as _rebuild_embedding_matrix()
  participant torch as torch
  check_duplicate->>rebuild: if _embedding_matrix_dirty or missing -> rebuild cache
  rebuild->>torch: stack per-ticket embeddings into matrix
  check_duplicate->>torch: compute batched cosine similarities (matrix @ query)
  torch->>check_duplicate: similarity scores tensor
  check_duplicate->>check_duplicate: torch.argmax to pick best ticket id and score
  check_duplicate->>check_duplicate: apply threshold -> return duplicate result

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

#634 — Vectorize Sentence-Transformers Cosine Similarity Computations: Implements vectorized batched similarity (PyTorch) and adds benchmark validation; aligns with the objective to replace loop-based cosine similarity.
#438 — Benchmark for loop vs vectorized cosine-similarity: This PR's benchmark and vectorized duplicate detection directly address that issue's benchmarking and optimization goal.

Poem

🐰 I hopped through tensors in the night,
Stacked the vectors, kept them tight,
One argmax finds what once took hours,
Benchmarks hum — matrix-powered flowers,
Swift as carrots, small delight.

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Linked Issues check	⚠️ Warning	The PR implements vectorization and benchmarking (primary coding objectives from `#634`) but omits ONNX export, which was listed as a required implementation step.	Complete the ONNX export implementation or document why it was deferred, as issue `#634` explicitly requires an ONNX export script as a core technical step.
Docstring Coverage	⚠️ Warning	Docstring coverage is 77.78% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately and concisely summarizes the main change: vectorizing cosine similarity computations for improved performance in duplicate detection.
Out of Scope Changes check	✅ Passed	All changes (vectorized similarity, caching, benchmark script) directly support the core objectives of issue `#634` and remain within scope.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (3)

backend/services/duplicate_service.py (2)
125-125: ⚡ Quick win

Make the optional parameter explicit (float | None).

Ruff flags this as an implicit Optional (RUF013). Line 23 already uses | None syntax, so this is consistent with the file.
♻️ Proposed fix
-    def check_duplicate(self, text: str, threshold: float = None) -> dict:
+    def check_duplicate(self, text: str, threshold: float | None = None) -> dict:
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/services/duplicate_service.py` at line 125, The function signature
for check_duplicate currently uses the implicit Optional pattern (threshold:
float = None); update the type annotation to be explicit by changing it to
threshold: float | None = None in the check_duplicate method so it matches the
file's use of `| None` and satisfies the RUF013 rule.
96-112: ⚡ Quick win

Fix potential state desync in _rebuild_embedding_matrix() by snapshotting _tickets

DuplicateService._rebuild_embedding_matrix() builds _ticket_ids and the stacked embeddings from two separate passes over self._tickets. add_ticket() appends to self._tickets and sets _embedding_matrix_dirty=True, while check_duplicate() may rebuild the matrix when dirty/stale, so concurrent mutation could desync _ticket_ids vs _embedding_matrix.

In backend/main.py, the call sites for duplicate_service.add_ticket(...) and duplicate_service.check_duplicate(...) are inside async def routes, but the service methods are synchronous and torch ops may release the GIL; if the app is running with multiple threads/workers within a process, this race is still plausible. Snapshotting avoids the mismatch without relying on deployment details.
-        self._ticket_ids = [tid for tid, _, _ in self._tickets]
-        embeddings = [emb for _, emb, _ in self._tickets]
-        self._embedding_matrix = torch.stack(embeddings)
+        tickets = list(self._tickets)  # consistent snapshot
+        self._ticket_ids = [tid for tid, _, _ in tickets]
+        embeddings = [emb for _, emb, _ in tickets]
+        self._embedding_matrix = torch.stack(embeddings)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/services/duplicate_service.py` around lines 96 - 112,
_rebuild_embedding_matrix currently iterates over self._tickets twice, which can
lead to _ticket_ids vs the stacked _embedding_matrix getting out of sync if
self._tickets is mutated concurrently (e.g., between add_ticket and
check_duplicate); fix by snapshotting tickets at the start of
_rebuild_embedding_matrix (e.g., local_tickets = list(self._tickets)) and then
build _ticket_ids and embeddings from that snapshot before calling torch.stack,
then set _embedding_matrix and _ticket_ids and clear _embedding_matrix_dirty;
this ensures atomic consistency without changing add_ticket or check_duplicate
signatures.
backend/services/benchmark_similarity.py (1)
26-45: ⚡ Quick win

Add an untimed warm-up before measuring.

The first timed round absorbs one-time allocation/kernel-init overhead, which can skew the reported averages (most visibly at small n). Since the PR's speedup claims rely on these numbers, a warm-up call makes them more representative.
♻️ Proposed fix
 def benchmark_loop(query: torch.Tensor, stored: list[torch.Tensor], rounds: int = 5) -> float:
     """Old approach: iterate and compute cos_sim one at a time."""
+    for emb in stored:  # warm-up
+        util.cos_sim(query, emb)
     times = []
 def benchmark_vectorized(query: torch.Tensor, matrix: torch.Tensor, rounds: int = 5) -> float:
     """New approach: single batched cos_sim call."""
     query_2d = query.unsqueeze(0)
+    util.cos_sim(query_2d, matrix)  # warm-up
     times = []
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/services/benchmark_similarity.py` around lines 26 - 45, Both
benchmark_loop and benchmark_vectorized should perform an untimed warm-up call
to amortize one-time allocation/kernel-init overhead before starting the timed
rounds; update the functions (benchmark_loop and benchmark_vectorized) to run
the same computation once (e.g., one pass over stored in benchmark_loop and one
util.cos_sim call in benchmark_vectorized) prior to the for _ in range(rounds)
timing loop so the measured rounds exclude initialization costs.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@backend/services/benchmark_similarity.py`:
- Around line 26-45: Both benchmark_loop and benchmark_vectorized should perform
an untimed warm-up call to amortize one-time allocation/kernel-init overhead
before starting the timed rounds; update the functions (benchmark_loop and
benchmark_vectorized) to run the same computation once (e.g., one pass over
stored in benchmark_loop and one util.cos_sim call in benchmark_vectorized)
prior to the for _ in range(rounds) timing loop so the measured rounds exclude
initialization costs.

In `@backend/services/duplicate_service.py`:
- Line 125: The function signature for check_duplicate currently uses the
implicit Optional pattern (threshold: float = None); update the type annotation
to be explicit by changing it to threshold: float | None = None in the
check_duplicate method so it matches the file's use of `| None` and satisfies
the RUF013 rule.
- Around line 96-112: _rebuild_embedding_matrix currently iterates over
self._tickets twice, which can lead to _ticket_ids vs the stacked
_embedding_matrix getting out of sync if self._tickets is mutated concurrently
(e.g., between add_ticket and check_duplicate); fix by snapshotting tickets at
the start of _rebuild_embedding_matrix (e.g., local_tickets =
list(self._tickets)) and then build _ticket_ids and embeddings from that
snapshot before calling torch.stack, then set _embedding_matrix and _ticket_ids
and clear _embedding_matrix_dirty; this ensures atomic consistency without
changing add_ticket or check_duplicate signatures.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 42974083-3733-43bf-ad65-454075d2fccd

📥 Commits

Reviewing files that changed from the base of the PR and between da8faf2 and 35a9990.

📒 Files selected for processing (2)

backend/services/benchmark_similarity.py
backend/services/duplicate_service.py

…ts, benchmark warmup

fennhelloworld · 2026-05-30T09:20:14Z

Thanks for the review @coderabbitai! All three nitpick items have been addressed in the latest push:

Explicit Optional type (duplicate_service.py:125): Changed threshold: float = None → threshold: float | None = None
Snapshot _tickets for thread safety (duplicate_service.py:109-111): Added tickets = list(self._tickets) snapshot before iterating in _rebuild_embedding_matrix() to avoid potential state desync
Benchmark warm-up (benchmark_similarity.py): Added untimed warm-up calls in both benchmark_loop and benchmark_vectorized to avoid one-time allocation overhead skewing results

coderabbitai · 2026-05-30T09:20:33Z

Tip

For best results, initiate chat on the files or code changes.

@fennhelloworld Great work on all three fixes! Each one addresses exactly the right concern:

float | None = None — correct and idiomatic for Python 3.10+.
Thread-safety snapshot — snapshotting _tickets before iterating is the right pattern to guard against concurrent mutations.
Warm-up pass — eliminating first-call allocation bias from the benchmark numbers makes the reported speedups much more credible.

The implementation looks solid. Thanks for the thorough follow-through! 🐇✨

If you found this review helpful, would you consider giving us a shout-out on X?

Thank you for using CodeRabbit!

ritesh-1918 · 2026-05-31T06:05:23Z

Superb implementation, @fennhelloworld! I've successfully resolved all conflicts in your PR and queued it for merging into gssoc.

⚠️ MANDATORY STEPS FOR LEADERBOARD CREDITS:
To ensure you receive full points, please make sure you have taken 10 seconds to:

Give us a Star ⭐: https://github.com/ritesh-1918/HELPDESK.AI (Mandatory)
Follow my GitHub profile 👤: https://github.com/ritesh-1918 (Mandatory)
Let's connect on LinkedIn 💼: https://www.linkedin.com/in/ritesh1908/ (Mandatory)

Keep up the outstanding work! Let's build together! 🔥

coderabbitai Bot reviewed May 29, 2026

View reviewed changes

fix: address code review feedback - explicit Optional, snapshot ticke…

963a2da

…ts, benchmark warmup

ritesh-1918 changed the base branch from main to gssoc May 31, 2026 06:05

ritesh-1918 added gssoc GirlScript Summer of Code gssoc:approved GSSoC Approved PR level:advanced Advanced level difficulty quality:exceptional Exceptional code quality type:feature New feature labels May 31, 2026

ritesh-1918 merged commit 50cb86f into ritesh-1918:gssoc May 31, 2026
2 of 3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Optimize similarity search with vectorized cosine similarity (#634)#648

feat: Optimize similarity search with vectorized cosine similarity (#634)#648
ritesh-1918 merged 2 commits into
ritesh-1918:gssocfrom
fennhelloworld:feat/vectorized-cosine-similarity

fennhelloworld commented May 29, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

vercel Bot commented May 29, 2026

Uh oh!

coderabbitai Bot commented May 29, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related issues

Poem

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

fennhelloworld commented May 30, 2026

Uh oh!

coderabbitai Bot commented May 30, 2026

Uh oh!

ritesh-1918 commented May 31, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

fennhelloworld commented May 29, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Solution

Key changes

Benchmark results

Implementation details

How to test

Checklist

Summary by CodeRabbit

Uh oh!

vercel Bot commented May 29, 2026

Uh oh!

coderabbitai Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related issues

Poem

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

fennhelloworld commented May 30, 2026

Uh oh!

coderabbitai Bot commented May 30, 2026

Uh oh!

ritesh-1918 commented May 31, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fennhelloworld commented May 29, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 29, 2026 •

edited

Loading