Ner by leaabj · Pull Request #147 · yamirghofran/BookDB

leaabj · 2026-03-13T11:14:54Z

Adds name entity recognition to the book recommendation chatbot. When users mention specific book titles or author names, the system now:

Extracts book titles and author names from natural language using Groq LLM
Finds matching books/authors in the database with fuzzy matching (pg_trgm with Python fallback)
Generates context-aware recommendations based on real book metadata
Gracefully falls back to original behavior when entities not found

Closes #138

…ching

cloudflare-workers-and-pages · 2026-03-13T11:15:01Z

Deploying with Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status	Name	Latest Commit	Preview URL	Updated (UTC)
✅ Deployment successful! View logs	bookdb-landing	`cf5486d`	Commit Preview URL Branch Preview URL	Mar 19 2026, 09:41 AM

gemini-code-assist · 2026-03-13T11:15:14Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly upgrades the book recommendation chatbot by integrating Name Entity Recognition (NER). It enables the system to intelligently identify specific book titles and author names mentioned in user queries, then use this information to generate highly context-aware and accurate recommendations. This moves the chatbot beyond generic responses to a more personalized and precise recommendation experience, gracefully handling cases where entities are not found.

Highlights

Name Entity Recognition (NER) Implementation: Implemented Name Entity Recognition (NER) capabilities to automatically extract book titles and author names from user queries using a Groq Large Language Model (LLM).
Fuzzy Matching for Database Lookup: Introduced fuzzy matching logic, leveraging PostgreSQL's pg_trgm extension (with a Python difflib fallback), to accurately find corresponding books and authors in the database based on extracted entities.
Context-Aware Query Rewriting: Enhanced the chatbot's query rewriting mechanism to incorporate the extracted entity context, leading to more relevant and personalized book recommendations.
Configurable Entity Extraction Settings: Added new configuration settings to control various aspects of the entity extraction process, including enabling/disabling the feature, specifying the LLM model, setting similarity and confidence thresholds, and managing cache behavior.
Comprehensive Documentation and Testing: Provided detailed documentation for the new entity extraction feature and a dedicated test suite to ensure its correctness and robustness.

Changelog

apps/api/core/config.py
- Added new configuration parameters for controlling entity extraction, including enabling/disabling the feature, specifying the LLM model, setting similarity and confidence thresholds, and configuring cache behavior.
apps/api/core/entity_extraction.py
- Introduced a new module for LLM-based entity extraction, fuzzy database lookup for books and authors, context string generation, and entity resolution, complete with caching.
apps/api/routers/books.py
- Integrated the new entity extraction logic into the chatbot's search pipeline, allowing the system to resolve entities from user queries and pass this context to the query rewriter.
bookdb/models/chatbot_llm.py
- Updated LLM prompt templates and query rewriting functions to accept and utilize the extracted entity context, enabling more informed book descriptions.
context.md
- Removed a markdown file that contained old SQL error logs.
docs/entity-extraction.md
- Added detailed documentation for the new entity extraction feature, covering its architecture, components, usage examples, performance considerations, and testing.
tests/test_api/test_entity_extraction.py
- Added a new test suite to validate the functionality of the entity extraction module, including string similarity, caching, and LLM integration.

Activity

The pull request was created by leaabj.
The description clearly outlines the intent and changes of the feature.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a significant new feature: Name Entity Recognition (NER) for the book recommendation chatbot. The implementation uses Groq LLM for entity extraction and fuzzy matching against the database to provide context-aware recommendations. The changes are well-structured, with a new module for entity extraction, updates to LLM prompts, and integration into the search pipeline. The addition of comprehensive documentation and tests is also a great step. My review focuses on improving configuration management, code clarity, performance, and test coverage to ensure the feature is robust and maintainable.

gemini-code-assist · 2026-03-13T11:17:39Z

apps/api/core/entity_extraction.py

+# Entity Extraction with LLM
+# ============================================================================
+
+_ENTITY_EXTRACTION_MODEL = "meta-llama/llama-4-scout-17b-16e-instruct"


Several values in this file are hardcoded instead of being read from the application settings, which makes configuration difficult.

On this line, _ENTITY_EXTRACTION_MODEL should come from settings.ENTITY_EXTRACTION_MODEL.

On line 160, the ttl for _entity_lookup_cache should use settings.ENTITY_CACHE_TTL.

To fix this, please import settings from ..core.config and use these settings values.

gemini-code-assist · 2026-03-13T11:17:39Z

tests/test_api/test_entity_extraction.py

+"""Tests for entity extraction functionality.
+
+Tests are designed to work with the existing test environment
+and pytest configuration.
+"""
+
+import pytest
+import os
+
+# Test imports that work with project structure
+from apps.api.core.entity_extraction import (
+    _string_similarity,
+    get_cache_stats,
+    clear_entity_cache,
+)
+
+
+# ============================================================================
+# Unit Tests: String Similarity
+# ============================================================================
+
+
+def test_string_similarity_exact_match():
+    """Test exact match returns 1.0."""
+    score = _string_similarity("Harry Potter", "Harry Potter")
+    assert score == pytest.approx(1.0, abs=0.01)
+
+
+def test_string_similarity_case_insensitive():
+    """Test case-insensitive matching."""
+    score = _string_similarity("Harry Potter", "harry potter")
+    assert score == pytest.approx(1.0, abs=0.01)
+
+
+def test_string_similarity_partial_match():
+    """Test partial matching."""
+    score = _string_similarity("Harry Potter", "Harry")
+    assert score > 0.5
+    assert score < 1.0
+
+
+def test_string_similarity_no_match():
+    """Test no match returns low score."""
+    score = _string_similarity("Harry Potter", "Lord of the Rings")
+    assert score < 0.3
+
+
+def test_string_similarity_typo_tolerance():
+    """Test typo tolerance."""
+    score = _string_similarity("Harry Potter", "Hary Potter")
+    assert score > 0.8
+
+
+# ============================================================================
+# Unit Tests: Cache Management
+# ============================================================================
+
+
+def test_cache_stats():
+    """Get cache statistics."""
+    stats = get_cache_stats()
+
+    assert "size" in stats
+    assert "maxsize" in stats
+    assert "ttl" in stats
+
+    assert stats["size"] == 0  # Empty initially
+    assert stats["maxsize"] == 1000
+    assert stats["ttl"] == 3600
+
+
+def test_clear_cache():
+    """Clear entity cache."""
+    # Cache should be empty initially
+    stats_before = get_cache_stats()
+    assert stats_before["size"] == 0
+
+    # Clear cache (should work even if empty)
+    clear_entity_cache()
+
+    # Verify cache is still empty
+    stats_after = get_cache_stats()
+    assert stats_after["size"] == 0
+
+
+# ============================================================================
+# Tests: Edge Cases
+# ============================================================================
+
+
+def test_string_similarity_empty_strings():
+    """Handle empty strings."""
+    score1 = _string_similarity("", "Harry Potter")
+    score2 = _string_similarity("Harry Potter", "")
+
+    assert score1 < 0.5
+    assert score2 < 0.5
+
+
+def test_string_similarity_special_characters():
+    """Handle special characters."""
+    score = _string_similarity("Book & Test", "Book and Test")
+    assert score > 0.8  # Should still match well
+
+
+# ============================================================================
+# LLM Integration Tests (Only run if GROQ_API_KEY is set)
+# ============================================================================
+
+
+@pytest.mark.skipif(
+    "GROQ_API_KEY" not in os.environ,
+    reason="LLM tests require GROQ_API_KEY environment variable",
+)
+def test_extract_book_entities_basic():
+    """Test basic entity extraction (requires GROQ_API_KEY)."""
+    from apps.api.core.entity_extraction import extract_book_entities
+    from bookdb.models.chatbot_llm import create_groq_client_sync
+
+    if "GROQ_API_KEY" not in os.environ:
+        pytest.skip("GROQ_API_KEY not set")
+
+    client = create_groq_client_sync()
+    result = extract_book_entities("I love Harry Potter", client=client)
+
+    assert "book_titles" in result
+    assert "author_names" in result
+    assert "confidence" in result
+
+
+@pytest.mark.skipif(
+    "GROQ_API_KEY" not in os.environ,
+    reason="LLM tests require GROQ_API_KEY environment variable",
+)
+def test_extract_book_entities_empty_query():
+    """Handle empty queries (requires GROQ_API_KEY)."""
+    from apps.api.core.entity_extraction import extract_book_entities
+    from bookdb.models.chatbot_llm import create_groq_client_sync
+
+    if "GROQ_API_KEY" not in os.environ:
+        pytest.skip("GROQ_API_KEY not set")
+
+    client = create_groq_client_sync()
+    result = extract_book_entities("", client=client)
+
+    assert result.get("book_titles", []) == []
+    assert result.get("author_names", []) == []
+    # Low confidence for empty query
+    assert result.get("confidence", 0) < 0.5
+
+
+# ============================================================================
+# Tests: Context Generation
+# ============================================================================
+
+
+def test_get_book_context_without_db_session():
+    """Generate context without database session."""
+    from apps.api.core.entity_extraction import get_book_context_string
+    from bookdb.db.models import Book
+
+    book = Book(
+        id=1,
+        goodreads_id=100,
+        title="Test Book",
+        description="Test description",
+    )
+    context = get_book_context_string(book, 0.8)
+
+    assert "TITLE: Test Book" in context
+    assert "DESCRIPTION: Test description" in context


The test coverage for this new feature is incomplete. While the existing tests are a good start, they don't cover critical database-dependent functionality like find_books_by_title, find_authors_by_name, or the main resolve_entities function. The documentation in docs/entity-extraction.md mentions a much more extensive test suite (41 tests), which suggests that more testing is intended. Please add tests for the fuzzy lookup and entity resolution logic to ensure the feature is robust and reliable.

gemini-code-assist · 2026-03-13T11:17:39Z

apps/api/core/entity_extraction.py

+# ============================================================================
+
+_ENTITY_EXTRACTION_MODEL = "meta-llama/llama-4-scout-17b-16e-instruct"
+_ENTITY_EXTRACTION_RETRIES = 2


The constant _ENTITY_EXTRACTION_RETRIES is defined but never used. It seems the retry logic is handled elsewhere with a different setting. To avoid confusion and dead code, this line should be removed.

gemini-code-assist · 2026-03-13T11:17:39Z

apps/api/core/entity_extraction.py

+        full_book = db.scalar(select(Book).where(Book.id == book.id))
+        if full_book:
+            book = full_book


When fetching the full book object, you can eagerly load the authors and tags relationships to avoid potential N+1 queries later when accessing them. This is more efficient than the current approach which may lead to separate database queries for authors and tags. This will require importing selectinload from sqlalchemy.orm.

full_book = db.scalar( select(Book) .where(Book.id == book.id) .options( selectinload(Book.authors).selectinload(BookAuthor.author), selectinload(Book.tags).selectinload(BookTag.tag), ) ) if full_book: book = full_book

gemini-code-assist · 2026-03-13T11:17:39Z

bookdb/models/chatbot_llm.py

+    # Choose prompt based on whether we have entity context
+    prompt = (
+        BOOK_DESCRIPTION_WITH_CONTEXT_PROMPT
+        if entity_context
+        else BOOK_DESCRIPTION_PROMPT
+    )
+
+    # Build system message
+    system_content = prompt
+    if entity_context:
+        system_content = system_content.format(entity_context=entity_context)


This block of code for preparing the system prompt is a duplicate of the logic in the async version _rewrite_description (lines 203-213). To improve maintainability and reduce redundancy, consider extracting this logic into a shared helper function.

For example, you could create a function like this:

def _prepare_description_prompt(entity_context: Optional[str] = None) -> str: """Prepare the system prompt for description rewriting.""" prompt = ( BOOK_DESCRIPTION_WITH_CONTEXT_PROMPT if entity_context else BOOK_DESCRIPTION_PROMPT ) if entity_context: return prompt.format(entity_context=entity_context) return prompt

Then, both _rewrite_description and _rewrite_description_sync could be simplified by replacing this block with system_content = _prepare_description_prompt(entity_context).

yamirghofran

Looks good. Great work.

yamirghofran · 2026-03-16T09:23:44Z

Tests pass

yamirghofran · 2026-03-16T09:23:52Z

You can merge.

leaabj added 6 commits March 13, 2026 12:10

feat(extraction): Add entity extraction module with LLM and fuzzy mat…

b92e84a

…ching

feat(extraction): Add entity context support to query rewriter

f27c2f2

feat(extraction): Integrate entity extraction into search pipeline

2fe6c9a

feat(extraction): Add entity extraction configuration settings

81c9880

test(extraction): Add comprehensive entity extraction tests

f47d0af

feat(md)

895dc5d

leaabj requested a review from yamirghofran March 13, 2026 11:14

leaabj marked this pull request as ready for review March 13, 2026 11:14

gemini-code-assist bot reviewed Mar 13, 2026

View reviewed changes

yamirghofran approved these changes Mar 16, 2026

View reviewed changes

Merge branch 'dev' into NER

cf5486d

yamirghofran merged commit e56fd09 into dev Mar 19, 2026
4 checks passed

leaabj deleted the NER branch March 23, 2026 09:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ner#147

Ner#147
yamirghofran merged 7 commits intodevfrom
NER

leaabj commented Mar 13, 2026

Uh oh!

cloudflare-workers-and-pages bot commented Mar 13, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Mar 13, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Mar 13, 2026

Uh oh!

gemini-code-assist bot Mar 13, 2026

Uh oh!

gemini-code-assist bot Mar 13, 2026

Uh oh!

gemini-code-assist bot Mar 13, 2026

Uh oh!

gemini-code-assist bot Mar 13, 2026

Uh oh!

yamirghofran left a comment

Uh oh!

yamirghofran commented Mar 16, 2026

Uh oh!

yamirghofran commented Mar 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

leaabj commented Mar 13, 2026

Uh oh!

cloudflare-workers-and-pages bot commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying with Cloudflare Workers

Uh oh!

gemini-code-assist bot commented Mar 13, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

yamirghofran left a comment

Choose a reason for hiding this comment

Uh oh!

yamirghofran commented Mar 16, 2026

Uh oh!

yamirghofran commented Mar 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cloudflare-workers-and-pages bot commented Mar 13, 2026 •

edited

Loading