feat(quiz): streak competition + optional LLM judge for fuzzy answers#84
Draft
ejfox wants to merge 7 commits into
Draft
feat(quiz): streak competition + optional LLM judge for fuzzy answers#84ejfox wants to merge 7 commits into
ejfox wants to merge 7 commits into
Conversation
Picks up the Discord shenanigans-channel pitch: anki-style competition with streak bragging rights and a light LLM to accept fuzzy/equivalent answers. Layered onto the existing /quiz command. - Per-session streak tracking with 🔥 badges and end-of-quiz champion - Streaks reset only for users who attempted-and-missed (no penalty for sitting out a question) - /quiz start ai_judge:true uses a light LLM (Gemini Flash by default) to grade answers that fail the string check; gated on looksLikeAnswerAttempt to avoid burning tokens on chatter - 🤖 reaction signals AI-judged wins; late LLM verdicts are dropped if another user already won the question - 13 new unit tests covering streak logic and AI-judge race handling
Refactors the quiz UX from chatty per-event messages into a single live embed that updates in place, à la the Discord Wordle app. - Single host message per quiz; correct answers / skips / hints / timeouts edit it instead of spamming new lines - Wordle-style progress bar: 🟩 solved · ⬛ missed · 🟨 current · ⬜ upcoming - Action row buttons: 💡 Hint (with remaining count, disabled at 0), ⏭️ Skip, 🛑 End Quiz - Hint button reveals the next hint inline in the embed - End-of-quiz replaces the embed with a shareable summary card — emoji result row, winner, streak champion, final scoreboard, 🔁 Play Again button - Quiz buttons routed to a dedicated handler ahead of the generic intent processor so "Hint" / "Skip" don't get fed to the LLM - 10 new tests covering progress bar, button id round-trip, hint reveal, scoreboard rendering, and summary card edge cases (35/35 passing total)
Adds an async daily quiz alongside the existing real-time co-op mode. Same N cards for everyone each day, each user plays solo on their own time, then opts into a public emoji-grid share — pure Wordle pattern. - /quiz daily [deck] starts (or resumes) the day's puzzle ephemerally - Puzzle deterministically cached per (date, deck) so every player today sees the same cards; first player triggers the API fetch - Solo flow: Guess button opens a text modal, submission grades the answer with the same string matcher as co-op, embed re-renders in place with a growing 🟩/🟥 grid - End-of-game ephemeral result card with full answer key + Share button - Share posts a public Wordle-style block in the channel (username, score, emoji grid) - Two new SQLite tables: daily_quiz_puzzles (cache), daily_quiz_plays (per-user state, UNIQUE(user_id, date, deck)) - Leaderboard helper (score-then-completion-time) ready for future /quiz daily leaderboard surfacing - 18 new unit tests (53/53 total): puzzle cache, dedup, play state machine, leaderboard ranking, customId round-trip, embed shape
Adds a public server-scoped leaderboard for the async daily quiz, the natural pairing for the Wordle share grid. - /quiz leaderboard [scope] subcommand — public embed, scopes are Today's Daily / This Week / All-Time - daily_quiz_plays gains a guild_id column (auto-migrates existing rows on first init) and an index on (guild_id, user_id) - Aggregations per user/guild: total correct (primary rank), perfect days ⭐ (tie-break), plays, current + best day streaks 🔥 - Streak math: "current" allows yesterday as leading edge so today's unplayed daily doesn't immediately reset the run, à la Wordle - 11 new tests: streak edge cases (broken runs, yesterday-as-leading- edge, best vs current), guild scoping, scope filters, leaderboard embed empty-state + medal/badge rendering (64/64 total)
… puzzle
Two new admin-facing capabilities:
1. Per-server deck config
- /quiz config show / allow-decks / default-deck (Manage Server)
- Allow-list restricts what /quiz daily and /quiz start accept;
empty list = open by default so existing servers aren't broken
- default-deck lets /quiz daily fall back to a server-picked deck
when no deck argument is given
- New daily_quiz_guild_config table (guild_id PK)
2. Admin scheduling for tomorrow's daily
- /quiz schedule [deck] (Manage Server) — fetches 5 candidate cards,
posts an ephemeral preview with Shuffle / Use these / Cancel
- Daily puzzle is now per-guild: daily_quiz_puzzles gains guild_id
+ scheduled_by columns and a COALESCE-based unique index so
scheduled rows don't collide with the global cache row
- In-memory draft store (cleared on save/cancel) carries the
preview cards between Shuffle clicks
- Schema migration drops the legacy puzzle table if it predates
guild_id (it's only a fetch cache; safe to rebuild)
13 new tests: guild config CRUD + allow-list enforcement, schedule
upsert + per-guild isolation, schedule customId round-trip, preview
embed shape (78/78 total).
Adds an opt-in poll flow so members can vote on which deck the daily puzzle pulls from tomorrow. Admin opens the poll, anyone in the channel votes by clicking a deck button, admin closes it → the winning deck's cards are auto-fetched and scheduled. - /quiz vote (Manage Server) posts a public live-tally embed with one button per ballot deck + a Close button - Ballot is the guild's allow-list (or all named decks if unrestricted), excluding the '' "all decks" sentinel - Members can cast and re-cast their vote; tallies refresh in place - Close (admin only) picks the highest-vote deck (tie-break: ballot order), fetches 5 cards, hands them to scheduleDailyPuzzle. Embed loses its buttons and gains a 🏆 winner block - "No votes" close-case is handled — daily falls back to a random pull - One open poll per (guild, target_date); message id is persisted so button handlers can refresh the public embed without in-memory state - 13 new tests covering poll CRUD, vote upsert + close-blocking, tie-break determinism, customId round-trip, open/closed/empty embed shape (89/89 total)
…locks Three compounding growth/network-effect vectors driven from existing data (no new tables): 1. /quiz me [@user] — public profile card designed to be screenshotted off-platform. Surfaces total points, current/best streak, perfect days, win rate, last-7-days emoji grid, and earned badges. The command is the answer to "what bot is that?" inbound questions. 2. /quiz challenge @user [note] — public callout that tags the target and flexes the caller's most recent score grid. Includes a "Beat my score" CTA pointing at /quiz daily. Direct player acquisition loop. 3. Auto-achievement unlocks — after a player finishes a daily, the bot compares their pre/post stat snapshots and posts a celebratory message in channel tagging them and listing newly-earned badges. Social proof + return-to-play hook (others see badges, want their own). Achievement set (8): First Blood, Perfect (5/5), On Fire (3-day streak), Week Warrior (7-day), Habitual (30-day), Stellar (10 perfect days), Quarter Century (25 lifetime), Centurion (100 lifetime). Predicates are pure functions of stats so unlocks are deterministic and the "newly earned" diff is computed from stats-before-vs-after. Also fixes a latent bug: getMostRecentCompletedPlay was ordering by completed_at which ties when timestamps match the same second; now orders by puzzle date (with completed_at as secondary tie-break). 12 new tests covering stats aggregation, guild scoping, achievement predicates and diffing, recent-play lookup, and all three new embeds (101/101 total).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Picks up the #shenanigans pitch from Drake + Sylvan — "anki bot with behaviorism terms… we could compete getting streaks… we could use a very light LLM on the backend to check if it was right." The existing
/quizcommand already covers the typed-answer flashcard loop with curated decks, so this PR adds the two missing pieces.ai_judge:trueflag on/quiz startthat calls a light LLM (Gemini Flash by default via OpenRouter) when the string check fails, so synonyms / abbreviations / minor wording differences count. Gated by alooksLikeAnswerAttemptheuristic so the judge isn't called on chatter / commands / emoji-only messages.packages/discord/tests/quiz-session-manager.test.ts) covering streak award/reset, the race where an LLM verdict arrives after the question closed, and the chatter-vs-attempt heuristic.Test plan
npx vitest run packages/discord/tests/— 25/25 passing (13 new + 12 existing)tsc --noEmitonpackages/discord— no new errors/quiz start ai_judge:true questions:3, intentionally answer with a synonym, confirm 🤖 reaction + streak shows in the next score lineai_judge: verify default behavior is unchanged (string matching only, ✅ reaction)OPENROUTER_API_KEYenv is present on the discord PM2 process before relying on AI judge in prodConfig
QUIZ_JUDGE_MODEL(defaultgoogle/gemini-2.0-flash-001) — overrideable per envQUIZ_JUDGE_URL(defaulthttps://router.tools.ejfox.com/v1/chat/completions) — same router used by the proactive-judgment pathOPENROUTER_API_KEY; if unset, AI judge silently no-ops (string match only)Generated by Claude Code