feat(quiz): streak competition + optional LLM judge for fuzzy answers by ejfox · Pull Request #84 · room302studio/coachartie2

ejfox · 2026-05-19T20:29:13Z

Summary

Picks up the #shenanigans pitch from Drake + Sylvan — "anki bot with behaviorism terms… we could compete getting streaks… we could use a very light LLM on the backend to check if it was right." The existing /quiz command already covers the typed-answer flashcard loop with curated decks, so this PR adds the two missing pieces.

Per-session streaks with 🔥 badges, broadcast in score lines and an end-of-quiz "Streak Champion" callout. A user's streak only breaks on a question they attempted and missed, so sitting out doesn't penalize you — keeps the "compete for streaks" framing.
ai_judge:true flag on /quiz start that calls a light LLM (Gemini Flash by default via OpenRouter) when the string check fails, so synonyms / abbreviations / minor wording differences count. Gated by a looksLikeAnswerAttempt heuristic so the judge isn't called on chatter / commands / emoji-only messages.
🤖 reaction instead of ✅ when the AI judge was the one who accepted the answer, so it's transparent that an LLM weighed in. Late LLM verdicts are dropped if another user already won the question.
13 new unit tests (packages/discord/tests/quiz-session-manager.test.ts) covering streak award/reset, the race where an LLM verdict arrives after the question closed, and the chatter-vs-attempt heuristic.

Test plan

npx vitest run packages/discord/tests/ — 25/25 passing (13 new + 12 existing)
tsc --noEmit on packages/discord — no new errors
Live smoke in a Discord channel: /quiz start ai_judge:true questions:3, intentionally answer with a synonym, confirm 🤖 reaction + streak shows in the next score line
Live smoke without ai_judge: verify default behavior is unchanged (string matching only, ✅ reaction)
Verify OPENROUTER_API_KEY env is present on the discord PM2 process before relying on AI judge in prod

Config

QUIZ_JUDGE_MODEL (default google/gemini-2.0-flash-001) — overrideable per env
QUIZ_JUDGE_URL (default https://router.tools.ejfox.com/v1/chat/completions) — same router used by the proactive-judgment path
Reuses the existing OPENROUTER_API_KEY; if unset, AI judge silently no-ops (string match only)

Generated by Claude Code

Picks up the Discord shenanigans-channel pitch: anki-style competition with streak bragging rights and a light LLM to accept fuzzy/equivalent answers. Layered onto the existing /quiz command. - Per-session streak tracking with 🔥 badges and end-of-quiz champion - Streaks reset only for users who attempted-and-missed (no penalty for sitting out a question) - /quiz start ai_judge:true uses a light LLM (Gemini Flash by default) to grade answers that fail the string check; gated on looksLikeAnswerAttempt to avoid burning tokens on chatter - 🤖 reaction signals AI-judged wins; late LLM verdicts are dropped if another user already won the question - 13 new unit tests covering streak logic and AI-judge race handling

Refactors the quiz UX from chatty per-event messages into a single live embed that updates in place, à la the Discord Wordle app. - Single host message per quiz; correct answers / skips / hints / timeouts edit it instead of spamming new lines - Wordle-style progress bar: 🟩 solved · ⬛ missed · 🟨 current · ⬜ upcoming - Action row buttons: 💡 Hint (with remaining count, disabled at 0), ⏭️ Skip, 🛑 End Quiz - Hint button reveals the next hint inline in the embed - End-of-quiz replaces the embed with a shareable summary card — emoji result row, winner, streak champion, final scoreboard, 🔁 Play Again button - Quiz buttons routed to a dedicated handler ahead of the generic intent processor so "Hint" / "Skip" don't get fed to the LLM - 10 new tests covering progress bar, button id round-trip, hint reveal, scoreboard rendering, and summary card edge cases (35/35 passing total)

Adds an async daily quiz alongside the existing real-time co-op mode. Same N cards for everyone each day, each user plays solo on their own time, then opts into a public emoji-grid share — pure Wordle pattern. - /quiz daily [deck] starts (or resumes) the day's puzzle ephemerally - Puzzle deterministically cached per (date, deck) so every player today sees the same cards; first player triggers the API fetch - Solo flow: Guess button opens a text modal, submission grades the answer with the same string matcher as co-op, embed re-renders in place with a growing 🟩/🟥 grid - End-of-game ephemeral result card with full answer key + Share button - Share posts a public Wordle-style block in the channel (username, score, emoji grid) - Two new SQLite tables: daily_quiz_puzzles (cache), daily_quiz_plays (per-user state, UNIQUE(user_id, date, deck)) - Leaderboard helper (score-then-completion-time) ready for future /quiz daily leaderboard surfacing - 18 new unit tests (53/53 total): puzzle cache, dedup, play state machine, leaderboard ranking, customId round-trip, embed shape

Adds a public server-scoped leaderboard for the async daily quiz, the natural pairing for the Wordle share grid. - /quiz leaderboard [scope] subcommand — public embed, scopes are Today's Daily / This Week / All-Time - daily_quiz_plays gains a guild_id column (auto-migrates existing rows on first init) and an index on (guild_id, user_id) - Aggregations per user/guild: total correct (primary rank), perfect days ⭐ (tie-break), plays, current + best day streaks 🔥 - Streak math: "current" allows yesterday as leading edge so today's unplayed daily doesn't immediately reset the run, à la Wordle - 11 new tests: streak edge cases (broken runs, yesterday-as-leading- edge, best vs current), guild scoping, scope filters, leaderboard embed empty-state + medal/badge rendering (64/64 total)

… puzzle Two new admin-facing capabilities: 1. Per-server deck config - /quiz config show / allow-decks / default-deck (Manage Server) - Allow-list restricts what /quiz daily and /quiz start accept; empty list = open by default so existing servers aren't broken - default-deck lets /quiz daily fall back to a server-picked deck when no deck argument is given - New daily_quiz_guild_config table (guild_id PK) 2. Admin scheduling for tomorrow's daily - /quiz schedule [deck] (Manage Server) — fetches 5 candidate cards, posts an ephemeral preview with Shuffle / Use these / Cancel - Daily puzzle is now per-guild: daily_quiz_puzzles gains guild_id + scheduled_by columns and a COALESCE-based unique index so scheduled rows don't collide with the global cache row - In-memory draft store (cleared on save/cancel) carries the preview cards between Shuffle clicks - Schema migration drops the legacy puzzle table if it predates guild_id (it's only a fetch cache; safe to rebuild) 13 new tests: guild config CRUD + allow-list enforcement, schedule upsert + per-guild isolation, schedule customId round-trip, preview embed shape (78/78 total).

Adds an opt-in poll flow so members can vote on which deck the daily puzzle pulls from tomorrow. Admin opens the poll, anyone in the channel votes by clicking a deck button, admin closes it → the winning deck's cards are auto-fetched and scheduled. - /quiz vote (Manage Server) posts a public live-tally embed with one button per ballot deck + a Close button - Ballot is the guild's allow-list (or all named decks if unrestricted), excluding the '' "all decks" sentinel - Members can cast and re-cast their vote; tallies refresh in place - Close (admin only) picks the highest-vote deck (tie-break: ballot order), fetches 5 cards, hands them to scheduleDailyPuzzle. Embed loses its buttons and gains a 🏆 winner block - "No votes" close-case is handled — daily falls back to a random pull - One open poll per (guild, target_date); message id is persisted so button handlers can refresh the public embed without in-memory state - 13 new tests covering poll CRUD, vote upsert + close-blocking, tie-break determinism, customId round-trip, open/closed/empty embed shape (89/89 total)

@user

…locks Three compounding growth/network-effect vectors driven from existing data (no new tables): 1. /quiz me [@user] — public profile card designed to be screenshotted off-platform. Surfaces total points, current/best streak, perfect days, win rate, last-7-days emoji grid, and earned badges. The command is the answer to "what bot is that?" inbound questions. 2. /quiz challenge @user [note] — public callout that tags the target and flexes the caller's most recent score grid. Includes a "Beat my score" CTA pointing at /quiz daily. Direct player acquisition loop. 3. Auto-achievement unlocks — after a player finishes a daily, the bot compares their pre/post stat snapshots and posts a celebratory message in channel tagging them and listing newly-earned badges. Social proof + return-to-play hook (others see badges, want their own). Achievement set (8): First Blood, Perfect (5/5), On Fire (3-day streak), Week Warrior (7-day), Habitual (30-day), Stellar (10 perfect days), Quarter Century (25 lifetime), Centurion (100 lifetime). Predicates are pure functions of stats so unlocks are deterministic and the "newly earned" diff is computed from stats-before-vs-after. Also fixes a latent bug: getMostRecentCompletedPlay was ordering by completed_at which ties when timestamps match the same second; now orders by puzzle date (with completed_at as secondary tie-break). 12 new tests covering stats aggregation, guild scoping, achievement predicates and diffing, recent-play lookup, and all three new embeds (101/101 total).

claude added 7 commits May 19, 2026 20:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(quiz): streak competition + optional LLM judge for fuzzy answers#84

feat(quiz): streak competition + optional LLM judge for fuzzy answers#84
ejfox wants to merge 7 commits into
mainfrom
claude/add-artie-feature-3AQ7W

ejfox commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ejfox commented May 19, 2026

Summary

Test plan

Config

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants