Conversation
Deploying with
|
| Status | Name | Latest Commit | Preview URL | Updated (UTC) |
|---|---|---|---|---|
| ✅ Deployment successful! View logs |
bookdb-landing | cf5486d | Commit Preview URL Branch Preview URL |
Mar 19 2026, 09:41 AM |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly upgrades the book recommendation chatbot by integrating Name Entity Recognition (NER). It enables the system to intelligently identify specific book titles and author names mentioned in user queries, then use this information to generate highly context-aware and accurate recommendations. This moves the chatbot beyond generic responses to a more personalized and precise recommendation experience, gracefully handling cases where entities are not found. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a significant new feature: Name Entity Recognition (NER) for the book recommendation chatbot. The implementation uses Groq LLM for entity extraction and fuzzy matching against the database to provide context-aware recommendations. The changes are well-structured, with a new module for entity extraction, updates to LLM prompts, and integration into the search pipeline. The addition of comprehensive documentation and tests is also a great step. My review focuses on improving configuration management, code clarity, performance, and test coverage to ensure the feature is robust and maintainable.
| # Entity Extraction with LLM | ||
| # ============================================================================ | ||
|
|
||
| _ENTITY_EXTRACTION_MODEL = "meta-llama/llama-4-scout-17b-16e-instruct" |
There was a problem hiding this comment.
Several values in this file are hardcoded instead of being read from the application settings, which makes configuration difficult.
- On this line,
_ENTITY_EXTRACTION_MODELshould come fromsettings.ENTITY_EXTRACTION_MODEL. - On line 160, the
ttlfor_entity_lookup_cacheshould usesettings.ENTITY_CACHE_TTL.
To fix this, please import settings from ..core.config and use these settings values.
| """Tests for entity extraction functionality. | ||
|
|
||
| Tests are designed to work with the existing test environment | ||
| and pytest configuration. | ||
| """ | ||
|
|
||
| import pytest | ||
| import os | ||
|
|
||
| # Test imports that work with project structure | ||
| from apps.api.core.entity_extraction import ( | ||
| _string_similarity, | ||
| get_cache_stats, | ||
| clear_entity_cache, | ||
| ) | ||
|
|
||
|
|
||
| # ============================================================================ | ||
| # Unit Tests: String Similarity | ||
| # ============================================================================ | ||
|
|
||
|
|
||
| def test_string_similarity_exact_match(): | ||
| """Test exact match returns 1.0.""" | ||
| score = _string_similarity("Harry Potter", "Harry Potter") | ||
| assert score == pytest.approx(1.0, abs=0.01) | ||
|
|
||
|
|
||
| def test_string_similarity_case_insensitive(): | ||
| """Test case-insensitive matching.""" | ||
| score = _string_similarity("Harry Potter", "harry potter") | ||
| assert score == pytest.approx(1.0, abs=0.01) | ||
|
|
||
|
|
||
| def test_string_similarity_partial_match(): | ||
| """Test partial matching.""" | ||
| score = _string_similarity("Harry Potter", "Harry") | ||
| assert score > 0.5 | ||
| assert score < 1.0 | ||
|
|
||
|
|
||
| def test_string_similarity_no_match(): | ||
| """Test no match returns low score.""" | ||
| score = _string_similarity("Harry Potter", "Lord of the Rings") | ||
| assert score < 0.3 | ||
|
|
||
|
|
||
| def test_string_similarity_typo_tolerance(): | ||
| """Test typo tolerance.""" | ||
| score = _string_similarity("Harry Potter", "Hary Potter") | ||
| assert score > 0.8 | ||
|
|
||
|
|
||
| # ============================================================================ | ||
| # Unit Tests: Cache Management | ||
| # ============================================================================ | ||
|
|
||
|
|
||
| def test_cache_stats(): | ||
| """Get cache statistics.""" | ||
| stats = get_cache_stats() | ||
|
|
||
| assert "size" in stats | ||
| assert "maxsize" in stats | ||
| assert "ttl" in stats | ||
|
|
||
| assert stats["size"] == 0 # Empty initially | ||
| assert stats["maxsize"] == 1000 | ||
| assert stats["ttl"] == 3600 | ||
|
|
||
|
|
||
| def test_clear_cache(): | ||
| """Clear entity cache.""" | ||
| # Cache should be empty initially | ||
| stats_before = get_cache_stats() | ||
| assert stats_before["size"] == 0 | ||
|
|
||
| # Clear cache (should work even if empty) | ||
| clear_entity_cache() | ||
|
|
||
| # Verify cache is still empty | ||
| stats_after = get_cache_stats() | ||
| assert stats_after["size"] == 0 | ||
|
|
||
|
|
||
| # ============================================================================ | ||
| # Tests: Edge Cases | ||
| # ============================================================================ | ||
|
|
||
|
|
||
| def test_string_similarity_empty_strings(): | ||
| """Handle empty strings.""" | ||
| score1 = _string_similarity("", "Harry Potter") | ||
| score2 = _string_similarity("Harry Potter", "") | ||
|
|
||
| assert score1 < 0.5 | ||
| assert score2 < 0.5 | ||
|
|
||
|
|
||
| def test_string_similarity_special_characters(): | ||
| """Handle special characters.""" | ||
| score = _string_similarity("Book & Test", "Book and Test") | ||
| assert score > 0.8 # Should still match well | ||
|
|
||
|
|
||
| # ============================================================================ | ||
| # LLM Integration Tests (Only run if GROQ_API_KEY is set) | ||
| # ============================================================================ | ||
|
|
||
|
|
||
| @pytest.mark.skipif( | ||
| "GROQ_API_KEY" not in os.environ, | ||
| reason="LLM tests require GROQ_API_KEY environment variable", | ||
| ) | ||
| def test_extract_book_entities_basic(): | ||
| """Test basic entity extraction (requires GROQ_API_KEY).""" | ||
| from apps.api.core.entity_extraction import extract_book_entities | ||
| from bookdb.models.chatbot_llm import create_groq_client_sync | ||
|
|
||
| if "GROQ_API_KEY" not in os.environ: | ||
| pytest.skip("GROQ_API_KEY not set") | ||
|
|
||
| client = create_groq_client_sync() | ||
| result = extract_book_entities("I love Harry Potter", client=client) | ||
|
|
||
| assert "book_titles" in result | ||
| assert "author_names" in result | ||
| assert "confidence" in result | ||
|
|
||
|
|
||
| @pytest.mark.skipif( | ||
| "GROQ_API_KEY" not in os.environ, | ||
| reason="LLM tests require GROQ_API_KEY environment variable", | ||
| ) | ||
| def test_extract_book_entities_empty_query(): | ||
| """Handle empty queries (requires GROQ_API_KEY).""" | ||
| from apps.api.core.entity_extraction import extract_book_entities | ||
| from bookdb.models.chatbot_llm import create_groq_client_sync | ||
|
|
||
| if "GROQ_API_KEY" not in os.environ: | ||
| pytest.skip("GROQ_API_KEY not set") | ||
|
|
||
| client = create_groq_client_sync() | ||
| result = extract_book_entities("", client=client) | ||
|
|
||
| assert result.get("book_titles", []) == [] | ||
| assert result.get("author_names", []) == [] | ||
| # Low confidence for empty query | ||
| assert result.get("confidence", 0) < 0.5 | ||
|
|
||
|
|
||
| # ============================================================================ | ||
| # Tests: Context Generation | ||
| # ============================================================================ | ||
|
|
||
|
|
||
| def test_get_book_context_without_db_session(): | ||
| """Generate context without database session.""" | ||
| from apps.api.core.entity_extraction import get_book_context_string | ||
| from bookdb.db.models import Book | ||
|
|
||
| book = Book( | ||
| id=1, | ||
| goodreads_id=100, | ||
| title="Test Book", | ||
| description="Test description", | ||
| ) | ||
| context = get_book_context_string(book, 0.8) | ||
|
|
||
| assert "TITLE: Test Book" in context | ||
| assert "DESCRIPTION: Test description" in context |
There was a problem hiding this comment.
The test coverage for this new feature is incomplete. While the existing tests are a good start, they don't cover critical database-dependent functionality like find_books_by_title, find_authors_by_name, or the main resolve_entities function. The documentation in docs/entity-extraction.md mentions a much more extensive test suite (41 tests), which suggests that more testing is intended. Please add tests for the fuzzy lookup and entity resolution logic to ensure the feature is robust and reliable.
| # ============================================================================ | ||
|
|
||
| _ENTITY_EXTRACTION_MODEL = "meta-llama/llama-4-scout-17b-16e-instruct" | ||
| _ENTITY_EXTRACTION_RETRIES = 2 |
| full_book = db.scalar(select(Book).where(Book.id == book.id)) | ||
| if full_book: | ||
| book = full_book |
There was a problem hiding this comment.
When fetching the full book object, you can eagerly load the authors and tags relationships to avoid potential N+1 queries later when accessing them. This is more efficient than the current approach which may lead to separate database queries for authors and tags. This will require importing selectinload from sqlalchemy.orm.
full_book = db.scalar(
select(Book)
.where(Book.id == book.id)
.options(
selectinload(Book.authors).selectinload(BookAuthor.author),
selectinload(Book.tags).selectinload(BookTag.tag),
)
)
if full_book:
book = full_book| # Choose prompt based on whether we have entity context | ||
| prompt = ( | ||
| BOOK_DESCRIPTION_WITH_CONTEXT_PROMPT | ||
| if entity_context | ||
| else BOOK_DESCRIPTION_PROMPT | ||
| ) | ||
|
|
||
| # Build system message | ||
| system_content = prompt | ||
| if entity_context: | ||
| system_content = system_content.format(entity_context=entity_context) |
There was a problem hiding this comment.
This block of code for preparing the system prompt is a duplicate of the logic in the async version _rewrite_description (lines 203-213). To improve maintainability and reduce redundancy, consider extracting this logic into a shared helper function.
For example, you could create a function like this:
def _prepare_description_prompt(entity_context: Optional[str] = None) -> str:
"""Prepare the system prompt for description rewriting."""
prompt = (
BOOK_DESCRIPTION_WITH_CONTEXT_PROMPT
if entity_context
else BOOK_DESCRIPTION_PROMPT
)
if entity_context:
return prompt.format(entity_context=entity_context)
return promptThen, both _rewrite_description and _rewrite_description_sync could be simplified by replacing this block with system_content = _prepare_description_prompt(entity_context).
|
You can merge. |

Adds name entity recognition to the book recommendation chatbot. When users mention specific book titles or author names, the system now:
Closes #138