Feat/clustering by catebros · Pull Request #145 · yamirghofran/BookDB

catebros · 2026-03-10T19:36:56Z

Collect all books per user: ratings >=4 , shelved books, listed books
Cluster the embeddings
Recommend per cluster: compute each cluster's center and query with it

…lar_by_vector

cloudflare-workers-and-pages · 2026-03-10T19:37:00Z

Deploying with Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status	Name	Latest Commit	Preview URL	Updated (UTC)
✅ Deployment successful! View logs	bookdb-landing	`230e6aa`	Commit Preview URL Branch Preview URL	Mar 10 2026, 07:46 PM

gemini-code-assist · 2026-03-10T19:37:12Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the book recommendation system by introducing a clustering-based approach. Instead of generating recommendations from a single user profile vector, the system now clusters a user's interaction history into distinct interest groups. Recommendations are then generated for each cluster's centroid, aiming to provide more diverse and relevant suggestions by capturing multiple facets of a user's reading preferences. A fallback mechanism ensures that the previous recommendation method is used if clustering is not applicable or yields no results.

Highlights

Clustering-based Recommendations: Implemented a new book recommendation strategy that clusters user interaction history to provide more diverse and relevant suggestions.
Vector Search Enhancements: Modified the most_similar_by_vector function to allow excluding specific book IDs from similarity searches, improving recommendation precision.
Embedding Vector Retrieval: Added a new utility function, get_vectors_by_ids, to efficiently fetch embedding vectors for a given list of book IDs from the Qdrant vector database.
New Clustering Module: Introduced a dedicated module for K-means clustering of book embeddings, enabling the grouping of similar books based on their vector representations.

Changelog

apps/api/core/embeddings.py
- Modified most_similar_by_vector to accept an exclude_ids parameter, allowing specific book IDs to be excluded from similarity searches.
- Added a new function get_vectors_by_ids to retrieve embedding vectors for a list of goodreads IDs from Qdrant.
apps/api/routers/discovery.py
- Imported zip_longest for combining results from multiple clusters.
- Imported cluster_seeds_by_embedding from the new clustering module.
- Imported most_similar_by_vector and get_vectors_by_ids from the embeddings module.
- Defined MIN_SEEDS_FOR_CLUSTERING constant.
- Implemented _cluster_vector_recommendations to orchestrate the clustering process, including collecting user interaction seeds, fetching vectors, performing K-means clustering, and generating recommendations per cluster.
- Updated get_recommendations to prioritize the new _cluster_vector_recommendations method, falling back to the existing _interaction_vector_recommendations if clustering is not feasible or yields no results.
bookdb/vector_db/clustering.py
- Added a new file containing the cluster_seeds_by_embedding function, which performs K-means clustering on book embedding vectors with optional weighting.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a new recommendation strategy based on clustering user interaction embeddings. The user's book interactions are clustered, and recommendations are generated from the center of each cluster. The changes include new functions for fetching vectors from Qdrant, a clustering utility, and the main recommendation logic that integrates this new strategy with a fallback to the previous method.

My review focuses on improving robustness and maintainability. The main points are:

Replacing broad except Exception blocks with more specific exception handling and logging to avoid silencing errors.
Refactoring duplicated code to improve readability.
Highlighting a potentially brittle approach to handling named vectors in Qdrant.

gemini-code-assist · 2026-03-10T19:41:07Z

apps/api/routers/discovery.py

+    try:
+        vector_map = get_vectors_by_ids(qdrant_client, list(seed_scores.keys()))
+    except Exception:
+        return []


Using a broad except Exception can mask important errors and make debugging difficult. It's better to catch more specific exceptions that you expect from get_vectors_by_ids. Consider also logging the exception to ensure failures are not silent.

gemini-code-assist · 2026-03-10T19:41:07Z

apps/api/routers/discovery.py

+    try:
+        clusters = cluster_seeds_by_embedding(valid_seeds, seed_scores, n_clusters)
+    except Exception:
+        return []


This try...except block also uses a broad except Exception, which can hide errors occurring during the clustering process. Please catch more specific exceptions and consider logging any errors for better observability.

gemini-code-assist · 2026-03-10T19:41:07Z

apps/api/routers/discovery.py

+        try:
+            hits = most_similar_by_vector(
+                qdrant_client,
+                query_vector=centroid.tolist(),
+                top_k=per_cluster_limits[cluster_idx],
+                exclude_ids=cluster_excluded,
+            )
+        except Exception:


This is another case of catching a broad Exception, which can silence errors from most_similar_by_vector. This makes it hard to diagnose issues with Qdrant queries. It's recommended to handle specific exceptions and log them.

gemini-code-assist · 2026-03-10T19:41:07Z

apps/api/core/embeddings.py

+        if isinstance(vector, dict):
+            vector = next(iter(vector.values()), None)


This logic to handle named vectors by picking the first one from the dictionary can be brittle. If multiple named vectors exist, the one chosen is arbitrary and depends on dictionary insertion order. It would be more robust to either expect a specific vector name or handle the case of multiple vectors more explicitly. If only one vector is ever expected, adding a comment to clarify this assumption would be helpful.

gemini-code-assist · 2026-03-10T19:41:07Z

apps/api/routers/discovery.py

+        interaction_goodreads_ids = _cluster_vector_recommendations(
            db,
            current_user.id,
            qdrant_client=qdrant,
            limit=max(limit * 4, 80),
            exclude_ids=set(bpr_goodreads_ids),
        )
+        if not interaction_goodreads_ids:
+            # Fall back
+            interaction_goodreads_ids = _interaction_vector_recommendations(
+                db,
+                current_user.id,
+                qdrant_client=qdrant,
+                limit=max(limit * 4, 80),
+                exclude_ids=set(bpr_goodreads_ids),
+            )


The arguments passed to _cluster_vector_recommendations and the fallback _interaction_vector_recommendations are identical. To improve readability and avoid repetition, you can define the arguments once in a dictionary and unpack it for both function calls.

reco_args = { "db": db, "user_id": current_user.id, "qdrant_client": qdrant, "limit": max(limit * 4, 80), "exclude_ids": set(bpr_goodreads_ids), } interaction_goodreads_ids = _cluster_vector_recommendations(**reco_args) if not interaction_goodreads_ids: # Fall back interaction_goodreads_ids = _interaction_vector_recommendations(**reco_args)

fedecaselli

Very meaningful

yamirghofran

Great work

catebros added 3 commits March 10, 2026 17:33

feat(bookdb): add cluster_seeds_by_embedding to vector_db

ac8af8f

feat(embeddings): add get_vectors_by_ids and exclude_ids to most_simi…

3391123

…lar_by_vector

feat(discovery): cluster user interaction history

1ea68e7

catebros requested review from fedecaselli, leaabj and yamirghofran March 10, 2026 19:37

gemini-code-assist bot reviewed Mar 10, 2026

View reviewed changes

fix: discovery tests

230e6aa

fedecaselli approved these changes Mar 11, 2026

View reviewed changes

yamirghofran approved these changes Mar 11, 2026

View reviewed changes

catebros merged commit b8ecb4a into dev Mar 11, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/clustering#145

Feat/clustering#145
catebros merged 4 commits intodevfrom
feat/clustering

catebros commented Mar 10, 2026

Uh oh!

cloudflare-workers-and-pages bot commented Mar 10, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Mar 10, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Mar 10, 2026

Uh oh!

gemini-code-assist bot Mar 10, 2026

Uh oh!

gemini-code-assist bot Mar 10, 2026

Uh oh!

gemini-code-assist bot Mar 10, 2026

Uh oh!

gemini-code-assist bot Mar 10, 2026

Uh oh!

fedecaselli left a comment

Uh oh!

yamirghofran left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		if isinstance(vector, dict):
		vector = next(iter(vector.values()), None)

Conversation

catebros commented Mar 10, 2026

Uh oh!

cloudflare-workers-and-pages bot commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying with Cloudflare Workers

Uh oh!

gemini-code-assist bot commented Mar 10, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

fedecaselli left a comment

Choose a reason for hiding this comment

Uh oh!

yamirghofran left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

cloudflare-workers-and-pages bot commented Mar 10, 2026 •

edited

Loading