refactor(quiz): convert generate_quiz to quiz_agent (refactor #2) by Jose-Gael-Cruz-Lopez · Pull Request #71 · SaplingLearn/Sapling

Jose-Gael-Cruz-Lopez · 2026-05-04T07:50:09Z

Summary

Refactor #2 from ADR 0005's migration plan. Replaces routes/quiz.py:82's manual prompt-string augmentation + call_gemini_json with a typed Pydantic AI agent. The agent uses tools (read_concepts_for_user, read_misconceptions_for_course per ADR 0004) to pull weak-area + class-misconception data on demand instead of pre-stuffing the prompt — Logfire traces now show the actual tool calls + arguments instead of opaque string concatenation.

Frontend wire format and grading semantics are unchanged. The agent's flat Quiz.questions[QuizQuestion] output is converted back to the legacy {options:[{label,text,correct}]} persisted shape via _agent_question_to_wire, so submit_quiz's grading loop, the quiz_attempts row shape, and frontend submitQuiz/scoreQuiz flows all keep working without changes.

What's in this PR

backend/agents/quiz.py — quiz_agent + Quiz / QuizQuestion outputs. Flat schema per ADR 0003 conv. 4. Prompt hash 17ab80b30316 baked into run metadata.
backend/agents/tools/graph_read.py — read_concepts_for_user (sorted ASC by mastery) and read_misconceptions_for_course (reads course_concept_stats — there's no separate misconceptions table; ADR 0004 assumed one). Pure-async + Pydantic AI tool wrappers.
backend/agents/_providers.py — adds quiz to AgentTask + _DEFAULTS (default gemini-2.5-flash-lite); env override SAPLING_MODEL_QUIZ.
backend/routes/quiz.py — now async. _quiz_via_agent + _legacy_generate_quiz wrap the agent-vs-fallback decision. HTTPException re-raised inside the bare-Exception catch so legitimate 404s during fallback don't recurse into legacy infinitely.
backend/tests/evals/quiz_generation.py — 8 cases × 6 evaluators per ADR 0005. Full 3-difficulty × 2-type grid coverage. Wired through tests.evals._replay.
docs/decisions/0013-refactor-2-quiz-shipped.md — captures what surprised us (no quiz cache exists; misconceptions table doesn't exist; HTTPException re-raise gotcha; short-answer grading still TODO) and what to carry into refactor Fix the learning loop for the context #3 (chat tutor).

Tests

tests/test_quiz_routes.py: 19/19 (was 14; +3 agent-success +2 agent-fallback)
tests/test_quiz_agent_imports.py: 2/2 (new)
tests/test_graph_read_tools.py: 5/5 (new)
tests/test_shared_course_context.py: existing tests adapted (force legacy fallback via quiz_agent.run patch)
tests/test_documents_routes.py: bonus fix — stale test_rejects_file_over_15mb updated for the 100 MB cap (was failing for everyone since commit 9912a25)
Full backend suite: 439 passed, 3 pre-existing live-Supabase failures unchanged. Net +1 vs main.

Compatibility

No env-var changes required. SAPLING_MODEL_QUIZ is an optional override.
No DB schema changes.
No frontend changes.
Legacy services/gemini_service.py::call_gemini_json path preserved per ADR 0001 — gets exercised when the agent trips UsageLimitExceeded / UnexpectedModelBehavior / any other exception.

Test plan

Backend unit tests pass (439/442 with the 3 baseline failures)
Eval set imports cleanly; 8 cases load
Manual smoke: generate a quiz on staging, verify the same wire shape arrives at the frontend (no regressions in submitQuiz/scoreQuiz)
Logfire trace inspection: confirm read_concepts_for_user_tool and read_misconceptions_for_course_tool calls appear with the right request_id correlation
Record cassettes (SAPLING_EVAL_MODE=record python tests/evals/quiz_generation.py) when next refreshing the eval suite

Out of scope

Real short-answer grading (currently single synthesized option for compatibility with submit_quiz's grading loop).
A real generated-quiz cache keyed by prompt version (ADR 0005 flagged this; turns out no quiz cache exists today).
Refactor Fix the learning loop for the context #3 (chat tutor) and Add landing page with liquid glass effects #4 (syllabus dedup) — see ADR 0001.

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- AI-powered quiz generation that targets students' weakest concepts with multiple-choice questions including difficulty, explanations, and concept mapping; route falls back to legacy generation when the agent fails or produces invalid questions.
Tests
- Added extensive evals and unit tests covering quiz generation, graph-read helpers, route behaviors, and fingerprinting.

…ADR 0005) Replaces routes/quiz.py:82's manual prompt-string augmentation + call_gemini_json with a typed Pydantic AI agent. The agent has tools to pull weak-area + class-misconception data on demand instead of pre-stuffing them into the user message — those tool calls show up in Logfire traces, replacing the previously-opaque prompt building. Wire shape (frontend submit/grade flows) unchanged. Legacy path preserved per ADR 0001's migration contract. Per-task model routing via SAPLING_MODEL_QUIZ env var. What landed - backend/agents/quiz.py: quiz_agent + Quiz/QuizQuestion outputs (kept flat per ADR 0003 conv. 4). Difficulty + question type are Literal types so Gemini's enum constraint applies. Prompt hash 17ab80b30316 baked into the run metadata for span-side versioning. - backend/agents/tools/graph_read.py: read_concepts_for_user (sorted ASC by mastery so weakest concepts appear first) and read_misconceptions_for_course (reads course_concept_stats — there's no separate misconceptions table; ADR 0004 spec assumed one). Pure-async + thin Pydantic AI tool wrappers per the agents/tools/graph.py pattern. - backend/agents/_providers.py: 'quiz' added to AgentTask + _DEFAULTS; default gemini-2.5-flash-lite, override SAPLING_MODEL_QUIZ. - backend/routes/quiz.py: route is now async. _quiz_via_agent + _legacy_generate_quiz wrap the agent-vs-fallback decision. _agent_question_to_wire converts the flat agent output to the legacy {options:[{label,text,correct}]} persisted shape so submit_quiz's grading loop and the frontend are completely unaffected. HTTPException re-raised inside the bare-Exception catch so legitimate 404s during fallback don't recurse. - backend/tests/evals/quiz_generation.py: 8 cases × 6 evaluators per ADR 0005. Full 3-difficulty × 2-type grid coverage. Wired through the existing run_with_cassette replay layer; cassettes recorded separately via SAPLING_EVAL_MODE=record. Tests - tests/test_quiz_routes.py: 19/19 (was 14; +3 agent-success +2 agent-fallback). - tests/test_quiz_agent_imports.py: 2/2 (new). - tests/test_graph_read_tools.py: 5/5 (new). - tests/test_shared_course_context.py: existing tests adapted to drive the route via TestClient and force the legacy fallback by patching quiz_agent.run. - tests/test_documents_routes.py: stale 15 MB size-cap test renamed + fixed (cap was bumped to 100 MB in commit 9912a25 — the test never caught up, this restores it). - Full backend suite: 439 passed, 3 pre-existing live-Supabase failures unchanged. Was 438/4 before; net +1 pass. ADR 0013 captures what surprised us during the refactor (no quiz cache existed; misconceptions table doesn't exist; HTTPException re-raise gotcha; short-answer grading still TODO) and what to carry into refactor #3 (chat tutor). No frontend changes. No env-var changes other than the new optional SAPLING_MODEL_QUIZ override. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-05-04T07:50:17Z

Warning

Rate limit exceeded

@Jose-Gael-Cruz-Lopez has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 30 minutes and 10 seconds before requesting another review.

To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 91c870a3-e810-43a4-907d-7ea6f45f7f46

📥 Commits

Reviewing files that changed from the base of the PR and between 156ec1d and a2fd5cd.

📒 Files selected for processing (4)

backend/models/__init__.py
backend/routes/quiz.py
backend/tests/test_quiz_routes.py
docs/decisions/0013-refactor-2-quiz-shipped.md

📝 Walkthrough

Walkthrough

Added a typed quiz-generation agent with Pydantic output models, two async graph-read tools, an async /api/quiz/generate path that prefers the agent and falls back to the legacy generator, plus fingerprinting utilities and comprehensive tests and evals. (50 words)

Changes

Quiz Agent Feature

Layer / File(s)	Summary
Type & Schema `backend/agents/_providers.py`, `backend/agents/quiz.py`	`AgentTask` now includes `"quiz"`; `_DEFAULTS` adds `"quiz": "gemini-2.5-flash-lite"`. New exported types: `QuizDifficulty`, `QuizQuestionType`, `QuizQuestion`, and `Quiz` (MCQ-constrained Pydantic schemas).
Graph-read Data Readers `backend/agents/tools/graph_read.py`	Added `ConceptMastery` and `Misconception` models and async readers: `read_concepts_for_user` (queries `graph_nodes`, orders by `mastery_score.asc`, maps fields) and `read_misconceptions_for_course` (reads `course_concept_stats`, flattens `common_misconceptions`, dedupes case-insensitively). Both return `[]` on errors.
Tool Wrappers `backend/agents/tools/graph_read.py`	Added Pydantic-AI tool wrappers: `read_concepts_for_user_tool` and `read_misconceptions_for_course_tool` that pull params from `RunContext[SaplingDeps]` and delegate to the async readers.
Agent Wiring `backend/agents/quiz.py`, `backend/agents/_providers.py`	New `quiz_agent` configured with `model_for("quiz")`, `SaplingDeps`, `Quiz` output model, `_SYSTEM_PROMPT`, prompt hash metadata, and tools `read_concepts_for_user_tool` and `read_misconceptions_for_course_tool`.
Route Integration & Adapter `backend/routes/quiz.py`	`/generate` endpoint converted to async. New `_quiz_via_agent` calls `quiz_agent`, and `_agent_question_to_wire` converts agent `QuizQuestion` → legacy wire format, dropping questions whose `correct_answer` does not verbatim match an option (triggers legacy fallback if all drop). Agent-specific exceptions (`UsageLimitExceeded`, `UnexpectedModelBehavior`) and unexpected errors log-and-fallback to `_legacy_generate_quiz`; `HTTPException` from legacy path is re-raised.
Fingerprinting & Scrubbing `backend/services/fingerprint.py`, `backend/services/logfire_scrubber.py`	Added stable fingerprint utilities `fingerprint` and `fingerprint_text` with unit-separator flattening; log scrubber delegates fingerprinting to `fingerprint_text`.
Tests `backend/tests/test_quiz_routes.py`, `backend/tests/test_quiz_agent_imports.py`, `backend/tests/test_graph_read_tools.py`, `backend/tests/test_shared_course_context.py`, `backend/tests/test_documents_routes.py`, `backend/tests/test_fingerprint.py`	Extensive unit/integration tests: agent success and fallback paths, wire-format contract tests, schema rejection for non-MCQ types, tool-reader behavior (mocked table), endpoint wiring via TestClient, file-size test adjustment, and fingerprint tests.
Evaluation Suite `backend/tests/evals/quiz_generation.py`	Added replayable eval dataset `quiz_generation` with cases and evaluators for question count, difficulty/type mix, MCQ shape, and concept coverage using `run_with_cassette`.
Docs / ADR `docs/decisions/0013-refactor-2-quiz-shipped.md`	ADR updated to record the quiz agent, wiring, prompt hash, observed surprises, post-review addendum (MCQ-only restriction, wire-format drift handling), and consequences.

Sequence Diagram

sequenceDiagram
    participant Client as Client / TestClient
    participant Route as POST /api/quiz/generate
    participant Agent as quiz_agent
    participant ConceptTool as read_concepts_for_user_tool
    participant MisconceptionTool as read_misconceptions_for_course_tool
    participant DB as Supabase Tables
    participant Legacy as _legacy_generate_quiz
    participant Response as HTTP Response

    Client->>Route: POST /api/quiz/generate (user_id, concept_node_id, difficulty, num_questions)
    Route->>Route: validate/extract node, derive request_id
    Route->>Agent: run(prompt, SaplingDeps(request_id))
    Agent->>ConceptTool: read_concepts_for_user(user_id, course_id)
    ConceptTool->>DB: SELECT graph_nodes (filter, order mastery ASC)
    DB-->>ConceptTool: [ConceptMastery...]
    ConceptTool-->>Agent: [ConceptMastery...]
    Agent->>MisconceptionTool: read_misconceptions_for_course(course_id)
    MisconceptionTool->>DB: SELECT course_concept_stats (common_misconceptions)
    DB-->>MisconceptionTool: [Misconception...]
    MisconceptionTool-->>Agent: [Misconception...]
    Agent->>Agent: generate Quiz (QuizQuestion[])
    Agent-->>Route: Quiz
    Route->>Route: convert via _agent_question_to_wire (validate correct_answer in options)
    alt valid converted questions exist
        Route->>DB: INSERT quiz_attempts (questions_json, metadata)
        DB-->>Route: insert OK
        Route-->>Response: 200 OK (legacy wire format)
    else agent error or all questions dropped
        Route->>Legacy: _legacy_generate_quiz(...)
        Legacy-->>Response: 200 OK (legacy wire format)
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

SaplingLearn/Sapling#65: Modifies quiz route and related quiz-generation paths; overlaps with route-level changes in this PR.
SaplingLearn/Sapling#67: Extends AgentTask/model-provider changes; directly related to adding the "quiz" task and default model.

Poem

🐰 I nibble weak concepts and sniff out misconception trail,

I call my trusty tools and weave a multiple-choice tale,
Options tidy, answers matched, explanations snug and neat,
I hop back to the route with questions for the seat—
A small rabbit quizmaster, serving knowledge sweet.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 34.62% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'refactor(quiz): convert generate_quiz to quiz_agent (refactor `#2`)' clearly and specifically describes the main change: converting the quiz generation route from a manual approach to a typed Pydantic AI agent as part of the second refactoring phase.
Description check	✅ Passed	The PR description comprehensively covers all required sections from the template: summary of changes, detailed breakdown of what's included, related issues context (ADR references), extensive testing information with test counts, and notes for reviewers including compatibility, test plan, and out-of-scope items. All template sections are well-addressed.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch refactor/2-quiz-agent

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

+from pydantic_evals import Case, Dataset
+from pydantic_evals.evaluators import Evaluator, EvaluatorContext
+
+from agents.quiz import quiz_agent, Quiz, QuizQuestion


cloudflare-workers-and-pages · 2026-05-04T07:51:40Z

Deploying with Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status	Name	Latest Commit	Preview URL	Updated (UTC)
✅ Deployment successful! View logs	frontend	`a2fd5cd`	Commit Preview URL Branch Preview URL	May 05 2026, 01:31 AM

1. Restrict QuizQuestionType to multiple_choice only - Schema-level rejection: Pydantic ValidationError on type="short_answer", pinned by test_short_answer_type_is_rejected_at_schema_layer. - Old behavior synthesized a single-option grading shim for short_answer that didn't actually work — submit_quiz grades by option-label and the frontend has no free-text UI. Real short-answer support is a future ADR when the frontend's there. - Prompt updated to reflect MCQ-only. 2. Drop silent first-option-correct fallback in _agent_question_to_wire. New behavior: if the agent's correct_answer doesn't match any option verbatim, log a warning and return None. Caller filters None and re-numbers survivors. If ALL questions drift, raise and degrade to legacy fallback rather than serve an empty or mismarked quiz. Pinned by TestQuizWireFormatContract (well-formed passthrough, drift drops, whitespace tolerance — 3 tests). 3. Eval set: rewrite the 3 short_answer cases as MCQ at the same difficulty levels (PHYS 101 definitions, ECON 201, theory of computation). ShortAnswerShapeEvaluator removed (no cases use it). Module-level assertion verifies all 8 cases are MCQ. Bonus: tests/test_documents_routes.py::test_rejects_file_over_max_size now monkeypatches MAX_FILE_SIZE to 1KB instead of allocating 100MB, which was flaky in the full-suite run under memory pressure. Same contract pinned (rejection message says "100 MB"). ADR 0013 addendum captures all three. Tests - tests/test_quiz_routes.py: 24/24 (was 19; +3 contract +1 short-answer rejection +1 net other = 5 net new). - tests/test_quiz_agent_imports.py: 2/2. - Full backend suite: 441 passed, 3-4 pre-existing live-Gemini and live-Supabase failures unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

1. All-drift cascade test (TestQuizAgentFallback) New `test_falls_back_to_legacy_when_all_questions_drift` pins the path the 3 contract tests don't cover directly: agent returns a schema-valid Quiz where every question's correct_answer doesn't appear in its options → _quiz_via_agent's wire-format filter drops all of them → raises RuntimeError → bare-Exception catch in generate_quiz routes to _legacy_generate_quiz. Asserts the legacy gemini path actually runs and the legacy fallback question is what reaches the client. 2. Drift warning no longer leaks student content to local logs _agent_question_to_wire's drift warning was using %r to dump the raw correct_answer, options, and concept text. Logfire's egress scrubber (PR #67) handled remote ingestion, but Railway's local stdout still saw the unredacted strings. Now we log: n_options=4, canonical_len=18, fp=<sha256[:12]> The fingerprint is stable across recurrences of the same drift, so we still get correlation; the actual content stays out of stdout. Hashlib import hoisted to module scope. Pre-existing transient: tests/test_ocr_pipeline.py::test_gemini_parse that flickered red in the previous review run cleared on re-run (skipped in isolation, passing in full suite). Confirmed transient live-Gemini hiccup, not caused by this branch. Tests - tests/test_quiz_routes.py: 23/23 (the previous "24" was a miscount; net +1 from the new cascade test). - Full backend suite: 443 passed, 3 pre-existing live-Supabase failures unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

backend/routes/quiz.py (1)

165-237: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

async /generate still performs blocking I/O on the event loop.

After making the handler async, synchronous DB/Gemini operations are still called directly (e.g., Line 173, Line 233, Line 243, Line 288). Under load, one slow fallback/DB call can stall unrelated requests.

Please offload blocking calls with asyncio.to_thread (or switch these dependencies to async clients), especially around node lookup, legacy generation, and quiz insert.

🔧 Example direction

+import asyncio
 ...
-    node_rows = table("graph_nodes").select(
-        "*", filters={"id": f"eq.{body.concept_node_id}"}
-    )
+    node_rows = await asyncio.to_thread(
+        lambda: table("graph_nodes").select(
+            "*", filters={"id": f"eq.{body.concept_node_id}"}
+        )
+    )
 ...
-        questions = await _legacy_generate_quiz(body, request)
+        questions = await asyncio.to_thread(_legacy_generate_quiz_sync, body, request)
 ...
-    table("quiz_attempts").insert({
-        "id": quiz_id,
-        "user_id": body.user_id,
-        "concept_node_id": body.concept_node_id,
-        "difficulty": body.difficulty,
-        "questions_json": questions,
-    })
+    await asyncio.to_thread(
+        lambda: table("quiz_attempts").insert({
+            "id": quiz_id,
+            "user_id": body.user_id,
+            "concept_node_id": body.concept_node_id,
+            "difficulty": body.difficulty,
+            "questions_json": questions,
+        })
+    )

Also applies to: 240-295

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/routes/quiz.py` around lines 165 - 237, The _legacy_generate_quiz
function is performing blocking I/O on the event loop (sync DB and Gemini
calls). Change the synchronous calls to run off the loop (use asyncio.to_thread)
or swap to async clients: run table("graph_nodes").select(...) and any node_rows
access via asyncio.to_thread, run get_graph(...) and get_quiz_context(...) via
asyncio.to_thread, call get_course_context(...) via asyncio.to_thread, and call
call_gemini_json(...) via asyncio.to_thread (or replace with an async Gemini
client). Ensure you await each asyncio.to_thread call and keep the rest of the
prompt-building logic unchanged so the function remains async and non-blocking.

🧹 Nitpick comments (1)

backend/tests/evals/quiz_generation.py (1)
15-17: 🏗️ Heavy lift

Keep this eval under the pytest test runner.

This module lives under backend/tests, but it is executed as a standalone script with sys.path surgery and a custom __main__ entrypoint. That bypasses the repo’s pytest collection/shared-fixture conventions and makes the eval harder to run consistently in CI.

As per coding guidelines, Backend tests must live in backend/tests/ and run via pytest. Shared fixtures (mock Supabase, mock Gemini) must be centralized in tests/conftest.py.

Also applies to: 275-276
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/tests/evals/quiz_generation.py` around lines 15 - 17, The file
quiz_generation.py currently mutates sys.path (sys.path.insert calls) and
provides a standalone __main__ entrypoint so it runs as a script; convert it
into a proper pytest test module by removing the sys.path.insert lines and the
custom __main__ runner, adding a pytest test function (e.g., def
test_quiz_generation(...):) that uses the centralized fixtures from
tests/conftest.py (refer to fixture names like mock_supabase and mock_gemini or
shared_client fixtures) to set up mocks instead of manual path surgery, and
ensure the test name starts with test_ so pytest will discover it.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/agents/quiz.py`:
- Around line 69-97: The system prompt (_SYSTEM_PROMPT) tells the agent to call
read_concepts_for_user and read_misconceptions_for_course, but the actual
registered tool functions are named read_concepts_for_user_tool and
read_misconceptions_for_course_tool, so the agent will not find the tools; fix
by making the names consistent: either rename the tool functions
(read_concepts_for_user_tool -> read_concepts_for_user and
read_misconceptions_for_course_tool -> read_misconceptions_for_course) or update
_SYSTEM_PROMPT to reference the actual registered names
(read_concepts_for_user_tool and read_misconceptions_for_course_tool) so the
prompt matches the function symbols used at registration.

In `@backend/agents/tools/graph_read.py`:
- Around line 54-56: Remove the unsupported course_id filter when querying
graph_nodes: delete the line that sets filters["course_id"] = f"eq.{course_id}"
inside the graph_nodes read code so you don't send a non-existent column to the
DB; also stop silently swallowing the resulting exception in the surrounding
try/except (the block that currently catches errors and returns []), instead let
the exception propagate or log and re-raise so callers know the read failed and
course-scoped context isn’t silently dropped. Reference the filters dict used
for graph_nodes reads and the try/except that returns an empty list to locate
the changes.

In `@backend/tests/test_documents_routes.py`:
- Around line 234-248: The test only exercises the sync endpoint; add coverage
for the streaming /upload route or unify the route copy so both endpoints use
the same MAX_FILE_SIZE text. Specifically, update or add a test (e.g., alongside
test_rejects_file_over_max_size) that uses _make_upload to POST to the streaming
endpoint URL ("/api/documents/upload") while monkeypatching
routes.documents.MAX_FILE_SIZE to 1024 and
routes.documents.extract_text_from_file as done now, then assert status_code ==
400 and that the response detail contains "100 MB"; alternatively, refactor the
upload route handler code (the function handling "/upload") to reference the
shared MAX_FILE_SIZE/formatting logic used by the sync handler so the error
message is consistent.

In `@docs/decisions/0013-refactor-2-quiz-shipped.md`:
- Around line 69-73: Update the ADR text so it consistently reflects that
short-answer questions are rejected by the shipped contract rather than being
shimmed; replace the “short-answer shim keeps grading working” wording in the
“What surprised us” section (references: submit_quiz and short_answer) to state
that short_answer is now rejected and that the prior shim behavior was removed,
ensuring the earlier section matches the addendum that describes rejection of
short_answer.

---

Outside diff comments:
In `@backend/routes/quiz.py`:
- Around line 165-237: The _legacy_generate_quiz function is performing blocking
I/O on the event loop (sync DB and Gemini calls). Change the synchronous calls
to run off the loop (use asyncio.to_thread) or swap to async clients: run
table("graph_nodes").select(...) and any node_rows access via asyncio.to_thread,
run get_graph(...) and get_quiz_context(...) via asyncio.to_thread, call
get_course_context(...) via asyncio.to_thread, and call call_gemini_json(...)
via asyncio.to_thread (or replace with an async Gemini client). Ensure you await
each asyncio.to_thread call and keep the rest of the prompt-building logic
unchanged so the function remains async and non-blocking.

---

Nitpick comments:
In `@backend/tests/evals/quiz_generation.py`:
- Around line 15-17: The file quiz_generation.py currently mutates sys.path
(sys.path.insert calls) and provides a standalone __main__ entrypoint so it runs
as a script; convert it into a proper pytest test module by removing the
sys.path.insert lines and the custom __main__ runner, adding a pytest test
function (e.g., def test_quiz_generation(...):) that uses the centralized
fixtures from tests/conftest.py (refer to fixture names like mock_supabase and
mock_gemini or shared_client fixtures) to set up mocks instead of manual path
surgery, and ensure the test name starts with test_ so pytest will discover it.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 838314be-9fa3-4a60-8a22-874b8591d9de

📥 Commits

Reviewing files that changed from the base of the PR and between 818535d and 4bee59e.

📒 Files selected for processing (11)

backend/agents/_providers.py
backend/agents/quiz.py
backend/agents/tools/graph_read.py
backend/routes/quiz.py
backend/tests/evals/quiz_generation.py
backend/tests/test_documents_routes.py
backend/tests/test_graph_read_tools.py
backend/tests/test_quiz_agent_imports.py
backend/tests/test_quiz_routes.py
backend/tests/test_shared_course_context.py
docs/decisions/0013-refactor-2-quiz-shipped.md

coderabbitai · 2026-05-04T12:42:19Z

+_SYSTEM_PROMPT = (
+    "You generate adaptive multiple-choice quizzes for a student. Each "
+    "question must target a specific concept the student has weak "
+    "mastery on, OR address a class-level misconception you've seen.\n\n"
+    "Workflow:\n"
+    "1. Call `read_concepts_for_user` to see the student's mastery per "
+    "   concept for this course (returned sorted by mastery ASC — "
+    "   weakest first).\n"
+    "2. Call `read_misconceptions_for_course` to see anonymized class "
+    "   misconceptions. Use these to phrase distractors and to write "
+    "   a question that probes the misconception.\n"
+    "3. Compose `Quiz.questions` so the WEAKEST concepts get the most "
+    "   questions, AND each item's `concept` field exactly matches a "
+    "   concept_name returned by tool 1.\n\n"
+    "Per-question rules (multiple-choice only — the type field is "
+    "constrained to 'multiple_choice'):\n"
+    "- 4 options, exactly one correct. The text in `correct_answer` "
+    "   MUST appear verbatim in `options` — character-for-character. "
+    "   Questions that violate this are dropped at the route layer.\n"
+    "- Distractors should reflect plausible misconceptions, not random "
+    "   noise. Use the read_misconceptions_for_course return value.\n"
+    "- explanation: 1-3 sentences explaining WHY the correct answer "
+    "   is correct — used in the post-quiz review screen.\n"
+    "- difficulty: align with the student's mastery on the concept; "
+    "   weakest concepts get easy/medium, strongest get hard.\n\n"
+    "Honor the requested num_questions and difficulty distribution "
+    "in the user message. Don't invent concepts the student doesn't "
+    "have."
+)


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail # Confirm the exact registered tool names and whether any aliasing is applied. rg -n --type=py 'def read_concepts_for_user_tool|def read_misconceptions_for_course_tool|read_concepts_for_user\b|read_misconceptions_for_course\b|tools=\[' backend/agents

Repository: SaplingLearn/Sapling

Length of output: 1385

🏁 Script executed:

# Check the tools registration in quiz.py sed -n '100,130p' backend/agents/quiz.py

Repository: SaplingLearn/Sapling

Length of output: 383

🏁 Script executed:

# Also check if pydantic-ai is imported and how it handles tool names rg -n "from pydantic_ai|import.*pydantic_ai|agent.*tools" backend/agents/quiz.py -A 2 -B 2

Repository: SaplingLearn/Sapling

Length of output: 568

🏁 Script executed:

# Check pydantic-ai documentation or examples in the codebase for tool naming rg -n "def.*_tool\(" backend/agents --type=py -B 5 | head -40

Repository: SaplingLearn/Sapling

Length of output: 1084

🏁 Script executed:

# Check if the _tool functions have any decorators or name parameters rg -n "@|name\s*=" backend/agents/tools/graph_read.py -B 2 -A 8 | grep -A 10 "read_concepts_for_user_tool\|read_misconceptions_for_course_tool"

Repository: SaplingLearn/Sapling

Length of output: 46

🏁 Script executed:

# Search for pydantic_ai documentation or configuration in the codebase rg -n "pydantic_ai.*tool|Tool.*name|tool.*alias" backend --type=py | head -20

Repository: SaplingLearn/Sapling

Length of output: 46

🌐 Web query:

pydantic-ai tool naming convention function name suffix

💡 Result:

Pydantic AI does not enforce a specific function name suffix or strict naming convention for tools beyond requiring unique tool names within a toolset or agent. The tool name defaults to the Python function name (snake_case convention in examples like 'roll_dice', 'get_player_name', 'customer_balance'), but can be explicitly overridden via the 'name' parameter in decorators (@agent.tool(name='custom_name'), Tool constructor, Tool.from_schema) or toolset methods. ToolDefinition names must be unique to avoid conflicts (raises UserError on duplicates), but no suffix like '_tool' is required or conventional. Use descriptive, unique names that clearly communicate purpose, following Python snake_case for functions.

Citations:

1: https://pydantic.dev/docs/ai/tools-toolsets/tools/

2: https://ai.pydantic.dev/tools-advanced

3: https://ai.pydantic.dev/toolsets

4: https://github.com/pydantic/pydantic-ai/blob/7f5214c6/pydantic_ai_slim/pydantic_ai/toolsets/function.py

5: https://ai.pydantic.dev/api/tools/

Fix tool name mismatch between prompt and registration.

The prompt instructs the agent to call read_concepts_for_user and read_misconceptions_for_course, but the registered tools are read_concepts_for_user_tool and read_misconceptions_for_course_tool. Pydantic-ai uses the function name directly as the tool name without auto-aliasing suffixes, so the agent will fail to locate these tools when following the prompt instructions.

Either rename the functions to remove the _tool suffix or update the prompt to reference the correct tool names.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@backend/agents/quiz.py` around lines 69 - 97, The system prompt (_SYSTEM_PROMPT) tells the agent to call read_concepts_for_user and read_misconceptions_for_course, but the actual registered tool functions are named read_concepts_for_user_tool and read_misconceptions_for_course_tool, so the agent will not find the tools; fix by making the names consistent: either rename the tool functions (read_concepts_for_user_tool -> read_concepts_for_user and read_misconceptions_for_course_tool -> read_misconceptions_for_course) or update _SYSTEM_PROMPT to reference the actual registered names (read_concepts_for_user_tool and read_misconceptions_for_course_tool) so the prompt matches the function symbols used at registration.

coderabbitai · 2026-05-04T12:42:20Z

+        if course_id:
+            filters["course_id"] = f"eq.{course_id}"
+        try:


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Remove the unsupported course_id filter from graph_nodes reads.

On Line 55, filters["course_id"] is added, but the provided schema for graph_nodes (in backend/db/archive/schema.sql:11-22) has no course_id column. That causes the select to fail, then Lines 65-71 swallow it and return [], so weak-concept context is silently lost for course-scoped requests.

🔧 Proposed fix

def _fetch() -> list[dict[str, Any]]: filters = {"user_id": f"eq.{user_id}"} - if course_id: - filters["course_id"] = f"eq.{course_id}" try: return ( table("graph_nodes").select( "concept_name,mastery_score,last_studied_at", filters=filters, order="mastery_score.asc", ) or [] )

Also applies to: 65-71

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@backend/agents/tools/graph_read.py` around lines 54 - 56, Remove the unsupported course_id filter when querying graph_nodes: delete the line that sets filters["course_id"] = f"eq.{course_id}" inside the graph_nodes read code so you don't send a non-existent column to the DB; also stop silently swallowing the resulting exception in the surrounding try/except (the block that currently catches errors and returns []), instead let the exception propagate or log and re-raise so callers know the read failed and course-scoped context isn’t silently dropped. Reference the filters dict used for graph_nodes reads and the try/except that returns an empty list to locate the changes.

coderabbitai · 2026-05-04T12:42:20Z

+    def test_rejects_file_over_max_size(self):
+        # Cap was raised from 15 MB → 100 MB in commit 9912a25. Don't
+        # allocate 100MB here — it makes the test fragile under full-suite
+        # memory pressure. Instead, monkeypatch MAX_FILE_SIZE to 1 KB and
+        # pin the rejection-message contract.
+        with (
+            _mock_validate_user(),
+            patch("routes.documents.MAX_FILE_SIZE", 1024),
+            patch("routes.documents.extract_text_from_file", return_value=""),
+        ):
+            r = _make_upload(content=b"x" * 2048)
        assert r.status_code == 400
-        assert "15 MB" in r.json()["detail"]
+        # The error message names the actual cap (100 MB) — verifies the
+        # route's hardcoded copy in the detail string didn't drift.
+        assert "100 MB" in r.json()["detail"]


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

This only covers /upload/sync, so the stale /upload copy still slips through.

The new test verifies the sync endpoint’s limit text, but _make_upload() posts to /api/documents/upload/sync; the streaming /api/documents/upload path still has the old "15 MB" message in the route snippet, and this change doesn’t protect it.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@backend/tests/test_documents_routes.py` around lines 234 - 248, The test only exercises the sync endpoint; add coverage for the streaming /upload route or unify the route copy so both endpoints use the same MAX_FILE_SIZE text. Specifically, update or add a test (e.g., alongside test_rejects_file_over_max_size) that uses _make_upload to POST to the streaming endpoint URL ("/api/documents/upload") while monkeypatching routes.documents.MAX_FILE_SIZE to 1024 and routes.documents.extract_text_from_file as done now, then assert status_code == 400 and that the response detail contains "100 MB"; alternatively, refactor the upload route handler code (the function handling "/upload") to reference the shared MAX_FILE_SIZE/formatting logic used by the sync handler so the error message is consistent.

coderabbitai · 2026-05-04T12:42:20Z

+5. **Short-answer grading doesn't exist yet.** `submit_quiz` grades by
+   `q["options"][i].correct` lookup. For short-answer questions, the wrapper
+   synthesizes a single-option `{label:"A", correct:True}` so existing
+   grading code keeps working. Real short-answer grading (e.g. fuzzy-match,
+   LLM-judged) is its own future scope.


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

ADR contains conflicting statements about short-answer support.

The “What surprised us” text (Line 69-73) still says the short-answer shim keeps grading working, but the addendum (Line 132-140) says that behavior was removed and short_answer is now rejected. Please align the earlier section with the shipped contract.

Also applies to: 132-140

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@docs/decisions/0013-refactor-2-quiz-shipped.md` around lines 69 - 73, Update the ADR text so it consistently reflects that short-answer questions are rejected by the shipped contract rather than being shimmed; replace the “short-answer shim keeps grading working” wording in the “What surprised us” section (references: submit_quiz and short_answer) to state that short_answer is now rejected and that the prior shim behavior was removed, ensuring the earlier section matches the addendum that describes rejection of short_answer.

Three small follow-ups from the previous review pass. 1. Extract shared fingerprint helper (services/fingerprint.py) The sha256[:N] hashing pattern was duplicated in services/logfire_scrubber.py::_fingerprint and routes/quiz.py's new drift warning. Both now route through fingerprint() / fingerprint_text() in a single module so the convention is consistent and a third callsite is one line. Adds 9 unit tests covering deterministic hashing, custom length, list/None/int handling, and the unit-separator collision-resistance contract. 2. Use ASCII unit-separator (\x1f) instead of `|` in fingerprints `'|'.join(options)` would let `["a|b", "c"]` and `["a", "b|c"]` collapse to the same hash body. Real text never contains \x1f so the join is unambiguous. Pinned by test_unit_separator_avoids_pipe_ambiguity. 3. Tighten the all-drift cascade test `test_falls_back_to_legacy_when_all_questions_drift` now asserts BOTH halves of the contract: agent_run_mock.assert_called_once() confirms the agent path was tried first, gemini_mock.assert_called_once() confirms the legacy path then fired. Previously only the second was pinned — a regression that skipped the agent entirely could have slipped through. Tests - tests/test_fingerprint.py: 9/9 (new). - tests/test_logfire_scrubber.py: 9/9 (still pass after switching the underlying _fingerprint to fingerprint_text). - tests/test_quiz_routes.py: 23/23 (cascade test tightened). - Full backend suite: 452 passed (was 443; +9 from the new fingerprint tests). 3 pre-existing live-Supabase failures unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

backend/services/fingerprint.py (1)

24-55: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Validate fingerprint length bounds explicitly.

length is used directly for digest slicing. On Line 44 and Line 54, values like 0 or negatives silently produce unusable/ambiguous fingerprints, which weakens correlation reliability.

Proposed patch

+def _validate_length(length: int) -> int:
+    if not 1 <= length <= 64:
+        raise ValueError("length must be between 1 and 64")
+    return length
+
+
 def fingerprint(*parts: str | int | float | bool | None, length: int = 12) -> str:
@@
+    length = _validate_length(length)
     flat: list[str] = []
@@
 def fingerprint_text(value: str, length: int = 16) -> str:
@@
+    length = _validate_length(length)
     return hashlib.sha256(
         value.encode("utf-8", errors="replace"),
     ).hexdigest()[:length]

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/services/fingerprint.py` around lines 24 - 55, The functions
fingerprint and fingerprint_text currently slice the SHA256 hex digest with the
provided length which can be 0 or negative; add explicit validation at the start
of fingerprint and fingerprint_text to ensure the length is an int between 1 and
64 (since SHA256.hexdigest() is 64 hex chars) and raise a ValueError with a
clear message if it isn't; update both functions (fingerprint(...) and
fingerprint_text(...)) to perform this check before computing the digest so
callers get an immediate, descriptive error for invalid lengths.

🧹 Nitpick comments (1)

backend/services/fingerprint.py (1)

24-39: ⚡ Quick win

Align parts type hints with actual accepted inputs.

Line 24 typing excludes list/tuple, but Line 38 handles them intentionally. This creates false positives in static checking for valid calls (including your tests).

Proposed patch

-def fingerprint(*parts: str | int | float | bool | None, length: int = 12) -> str:
+def fingerprint(
+    *parts: str | int | float | bool | None | list[object] | tuple[object, ...],
+    length: int = 12,
+) -> str:

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/services/fingerprint.py` around lines 24 - 39, The function signature
for fingerprint does not include list/tuple types even though the implementation
flattens them; update the type hint for the varargs parameter `parts` in the
`fingerprint` function to also accept list and tuple of the same element types
(e.g. add list[str|int|float|bool|None] and tuple[str|int|float|bool|None] to
the Union), so static checkers know nested sequences are valid; keep references
to SEPARATOR/flat logic unchanged.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@backend/services/fingerprint.py`:
- Around line 24-55: The functions fingerprint and fingerprint_text currently
slice the SHA256 hex digest with the provided length which can be 0 or negative;
add explicit validation at the start of fingerprint and fingerprint_text to
ensure the length is an int between 1 and 64 (since SHA256.hexdigest() is 64 hex
chars) and raise a ValueError with a clear message if it isn't; update both
functions (fingerprint(...) and fingerprint_text(...)) to perform this check
before computing the digest so callers get an immediate, descriptive error for
invalid lengths.

---

Nitpick comments:
In `@backend/services/fingerprint.py`:
- Around line 24-39: The function signature for fingerprint does not include
list/tuple types even though the implementation flattens them; update the type
hint for the varargs parameter `parts` in the `fingerprint` function to also
accept list and tuple of the same element types (e.g. add
list[str|int|float|bool|None] and tuple[str|int|float|bool|None] to the Union),
so static checkers know nested sequences are valid; keep references to
SEPARATOR/flat logic unchanged.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 0010ed98-af75-42ac-98f5-64444e5540bb

📥 Commits

Reviewing files that changed from the base of the PR and between 4bee59e and 156ec1d.

📒 Files selected for processing (5)

backend/routes/quiz.py
backend/services/fingerprint.py
backend/services/logfire_scrubber.py
backend/tests/test_fingerprint.py
backend/tests/test_quiz_routes.py

🚧 Files skipped from review as they are similar to previous changes (2)

backend/tests/test_quiz_routes.py
backend/routes/quiz.py

Closes the asymmetry I flagged in the post-merge review: chat tutor got `model_pref: Literal["fast", "smart"]` per-request via PR #73, quiz had only env-var-driven model selection. Now both routes accept the same body field and route to the same model strings. Wiring - backend/models/__init__.py: GenerateQuizBody.model_pref accepts "fast" | "smart" | None. Default None falls through to the agent's task-default (model_for("quiz") = gemini-2.5-flash-lite per ADR 0008). - backend/routes/quiz.py: * New _resolve_model_pref(pref) builds an optional GoogleModel override from the body's preference, returning None for None/empty/unknown so the agent's default wins on degraded input. * _quiz_via_agent threads `model_pref` through and passes `model=...` to quiz_agent.run only when an override is present (no model kwarg = agent default). * _legacy_generate_quiz now picks MODEL_SMART when pref="smart", falls back to MODEL_LITE otherwise. So the fast/smart toggle works on BOTH paths, including when the agent path trips and we degrade. - _PREF_MODEL_NAMES table at module scope so the {fast→flash, smart→pro} mapping is one place. Mirrors the chat tutor's mapping in services/gemini_service.py (MODEL_DEFAULT/MODEL_SMART). Tests (TestQuizModelPref, 5 new) - model_pref="smart" → quiz_agent.run gets model=GoogleModel('gemini-2.5-pro') - model_pref="fast" → quiz_agent.run gets model=GoogleModel('gemini-2.5-flash') - model_pref=None → quiz_agent.run gets NO model kwarg (agent default wins) - model_pref="auto" → _resolve_model_pref returns None (graceful unknown) - legacy fallback with smart → call_gemini_json gets model=MODEL_SMART Documentation: ADR 0013 addendum captures the design decision (env var for ops defaults + body field for per-request overrides; they compose). Tests - tests/test_quiz_routes.py: 28/28 (was 23, +5). - Full backend suite: 531 passed, 3 pre-existing live-Supabase failures unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Closes the asymmetry I flagged in the most recent review of PR #71. Was: agent path "fast" → gemini-2.5-flash (upgrade), legacy path "fast" → gemini-2.5-flash-lite (no-op). Same body field, different models depending on which path ran. The kind of inconsistency that surfaces six months later as "why does Fast sometimes feel slower than Default?" Now: agent path AND legacy fallback both use the same mapping ("fast" → flash, "smart" → pro, else → lite). _PREF_MODEL_NAMES is the single source of truth on the agent side; the legacy elif-chain in _legacy_generate_quiz mirrors it exactly. Also: - backend/routes/quiz.py: re-imported MODEL_DEFAULT alongside MODEL_LITE/MODEL_SMART. Tightened the lazy-import comment in _resolve_model_pref to describe the actual benefit (deferring GoogleProvider construction to the first override request, not import-path isolation — agents.quiz is already imported at module top). - docs/decisions/0013-refactor-2-quiz-shipped.md: addendum reworded to "two independent layers, not multiplicative" — the env var sets the agent's startup baseline; the body field, when present, fully replaces it for the current call. Adds a paragraph documenting the agent/legacy symmetry and pointing at _PREF_MODEL_NAMES as the single source of truth. Tests (TestQuizModelPref grew to 7) - test_legacy_fallback_uses_default_when_pref_fast (NEW): pins the symmetry — legacy "fast" must call gemini with MODEL_DEFAULT, not MODEL_LITE. Asserts a clear failure message if anyone undoes the fix. - test_legacy_fallback_uses_lite_when_no_pref (NEW): pins the default — no model_pref still uses MODEL_LITE (the cheap baseline that's been there since the route shipped). - Existing 5 tests unchanged. Tests - tests/test_quiz_routes.py: 30/30 (was 28, +2). - Full backend suite: 533 passed, 3 pre-existing live-Supabase failures unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ssion_id Two correctness fixes from the self-review on PR #78. 1. Symmetric model defaults across agent + legacy paths. When body.model_pref is None (the default), the legacy fallback used to return MODEL_DEFAULT (gemini-2.5-flash) while the agent path returned the agent's task-default (gemini-2.5-pro). A user with no explicit pref silently downgraded from Pro -> Flash on fallback. This is the same bug PR #71's commit a2fd5cd fixed for quiz; chat now matches. - routes/learn.py: _resolve_tutor_model -> _resolve_legacy_model. Default fallback flipped from MODEL_DEFAULT to MODEL_SMART. Comment block + three call sites (start_session, _legacy_chat, action) updated. - tests/test_learn_routes.py: TestResolveTutorModel -> TestResolveLegacyModel. Four tests flipped their expected return from MODEL_DEFAULT to MODEL_SMART to pin the new symmetric contract. One new test (test_default_matches_agent_default_for_no_pref) pins agent/legacy parity explicitly. 2. session_id is a declared field on SaplingDeps, not an attribute attach. Before: routes/learn.py constructed SaplingDeps without session_id and then did `deps.session_id = session_id # type: ignore`. The tool wrapper read it via `getattr(..., None)`. Worked at runtime (unfrozen dataclass), but type checkers couldn't validate it, frozen=True would silently break it, and future SaplingDeps consumers had no way to discover the seam. - agents/deps.py: added `session_id: str | None = None` with a docstring entry covering the eval/batch case (legitimately None). - routes/learn.py: pass session_id via the constructor; removed the imperative attach + the obsolete comment + the # type: ignore. - agents/tools/chat_context.py: read_session_history_tool now does a direct `ctx.deps.session_id` read; getattr ceremony gone. - tests/test_chat_context_tools.py: missing-session test sets session_id=None explicitly to mirror the declared default. 55 tests pass across the three touched test files. No regressions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-code-quality Bot found potential problems May 4, 2026

View reviewed changes

Comment thread backend/tests/evals/quiz_generation.py

from pydantic_evals import Case, Dataset

from pydantic_evals.evaluators import Evaluator, EvaluatorContext

from agents.quiz import quiz_agent, Quiz, QuizQuestion

Jose-Gael-Cruz-Lopez and others added 2 commits May 4, 2026 04:16

coderabbitai Bot reviewed May 4, 2026

View reviewed changes

coderabbitai Bot reviewed May 5, 2026

View reviewed changes

Jose-Gael-Cruz-Lopez and others added 3 commits May 4, 2026 21:07

Merge remote-tracking branch 'origin/main' into refactor/2-quiz-agent

73e68ca

Jose-Gael-Cruz-Lopez merged commit 9818cde into main May 5, 2026
4 checks passed

Jose-Gael-Cruz-Lopez mentioned this pull request May 5, 2026

refactor(learn): convert chat tutor to chat_tutor_agent (refactor #3) #78

Merged

7 tasks

This was referenced May 5, 2026

refactor(calendar): unify syllabus extraction onto agent (refactor #4) #79

Merged

perf(tutor): cap Pro thinking budget + switch Fast to Flash-Lite #80

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(quiz): convert generate_quiz to quiz_agent (refactor #2)#71

refactor(quiz): convert generate_quiz to quiz_agent (refactor #2)#71
Jose-Gael-Cruz-Lopez merged 7 commits into
mainfrom
refactor/2-quiz-agent

Jose-Gael-Cruz-Lopez commented May 4, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 4, 2026 •

edited

Loading

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

cloudflare-workers-and-pages Bot commented May 4, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 4, 2026

Uh oh!

coderabbitai Bot May 4, 2026

Uh oh!

coderabbitai Bot May 4, 2026

Uh oh!

coderabbitai Bot May 4, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Jose-Gael-Cruz-Lopez commented May 4, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's in this PR

Tests

Compatibility

Test plan

Out of scope

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

cloudflare-workers-and-pages Bot commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying with Cloudflare Workers

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 4, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 4, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 4, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 4, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Jose-Gael-Cruz-Lopez commented May 4, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 4, 2026 •

edited

Loading

cloudflare-workers-and-pages Bot commented May 4, 2026 •

edited

Loading