Skip to content

feat(quiz): streak competition + optional LLM judge for fuzzy answers#84

Draft
ejfox wants to merge 7 commits into
mainfrom
claude/add-artie-feature-3AQ7W
Draft

feat(quiz): streak competition + optional LLM judge for fuzzy answers#84
ejfox wants to merge 7 commits into
mainfrom
claude/add-artie-feature-3AQ7W

Conversation

@ejfox
Copy link
Copy Markdown
Contributor

@ejfox ejfox commented May 19, 2026

Summary

Picks up the #shenanigans pitch from Drake + Sylvan — "anki bot with behaviorism terms… we could compete getting streaks… we could use a very light LLM on the backend to check if it was right." The existing /quiz command already covers the typed-answer flashcard loop with curated decks, so this PR adds the two missing pieces.

  • Per-session streaks with 🔥 badges, broadcast in score lines and an end-of-quiz "Streak Champion" callout. A user's streak only breaks on a question they attempted and missed, so sitting out doesn't penalize you — keeps the "compete for streaks" framing.
  • ai_judge:true flag on /quiz start that calls a light LLM (Gemini Flash by default via OpenRouter) when the string check fails, so synonyms / abbreviations / minor wording differences count. Gated by a looksLikeAnswerAttempt heuristic so the judge isn't called on chatter / commands / emoji-only messages.
  • 🤖 reaction instead of ✅ when the AI judge was the one who accepted the answer, so it's transparent that an LLM weighed in. Late LLM verdicts are dropped if another user already won the question.
  • 13 new unit tests (packages/discord/tests/quiz-session-manager.test.ts) covering streak award/reset, the race where an LLM verdict arrives after the question closed, and the chatter-vs-attempt heuristic.

Test plan

  • npx vitest run packages/discord/tests/ — 25/25 passing (13 new + 12 existing)
  • tsc --noEmit on packages/discord — no new errors
  • Live smoke in a Discord channel: /quiz start ai_judge:true questions:3, intentionally answer with a synonym, confirm 🤖 reaction + streak shows in the next score line
  • Live smoke without ai_judge: verify default behavior is unchanged (string matching only, ✅ reaction)
  • Verify OPENROUTER_API_KEY env is present on the discord PM2 process before relying on AI judge in prod

Config

  • QUIZ_JUDGE_MODEL (default google/gemini-2.0-flash-001) — overrideable per env
  • QUIZ_JUDGE_URL (default https://router.tools.ejfox.com/v1/chat/completions) — same router used by the proactive-judgment path
  • Reuses the existing OPENROUTER_API_KEY; if unset, AI judge silently no-ops (string match only)

Generated by Claude Code

claude added 7 commits May 19, 2026 20:28
Picks up the Discord shenanigans-channel pitch: anki-style competition
with streak bragging rights and a light LLM to accept fuzzy/equivalent
answers. Layered onto the existing /quiz command.

- Per-session streak tracking with 🔥 badges and end-of-quiz champion
- Streaks reset only for users who attempted-and-missed (no penalty for
  sitting out a question)
- /quiz start ai_judge:true uses a light LLM (Gemini Flash by default)
  to grade answers that fail the string check; gated on
  looksLikeAnswerAttempt to avoid burning tokens on chatter
- 🤖 reaction signals AI-judged wins; late LLM verdicts are dropped
  if another user already won the question
- 13 new unit tests covering streak logic and AI-judge race handling
Refactors the quiz UX from chatty per-event messages into a single live
embed that updates in place, à la the Discord Wordle app.

- Single host message per quiz; correct answers / skips / hints / timeouts
  edit it instead of spamming new lines
- Wordle-style progress bar: 🟩 solved · ⬛ missed · 🟨 current · ⬜ upcoming
- Action row buttons: 💡 Hint (with remaining count, disabled at 0),
  ⏭️ Skip, 🛑 End Quiz
- Hint button reveals the next hint inline in the embed
- End-of-quiz replaces the embed with a shareable summary card —
  emoji result row, winner, streak champion, final scoreboard,
  🔁 Play Again button
- Quiz buttons routed to a dedicated handler ahead of the generic intent
  processor so "Hint" / "Skip" don't get fed to the LLM
- 10 new tests covering progress bar, button id round-trip, hint reveal,
  scoreboard rendering, and summary card edge cases (35/35 passing total)
Adds an async daily quiz alongside the existing real-time co-op mode.
Same N cards for everyone each day, each user plays solo on their own
time, then opts into a public emoji-grid share — pure Wordle pattern.

- /quiz daily [deck] starts (or resumes) the day's puzzle ephemerally
- Puzzle deterministically cached per (date, deck) so every player today
  sees the same cards; first player triggers the API fetch
- Solo flow: Guess button opens a text modal, submission grades the
  answer with the same string matcher as co-op, embed re-renders in
  place with a growing 🟩/🟥 grid
- End-of-game ephemeral result card with full answer key + Share button
- Share posts a public Wordle-style block in the channel (username,
  score, emoji grid)
- Two new SQLite tables: daily_quiz_puzzles (cache), daily_quiz_plays
  (per-user state, UNIQUE(user_id, date, deck))
- Leaderboard helper (score-then-completion-time) ready for future
  /quiz daily leaderboard surfacing
- 18 new unit tests (53/53 total): puzzle cache, dedup, play state
  machine, leaderboard ranking, customId round-trip, embed shape
Adds a public server-scoped leaderboard for the async daily quiz, the
natural pairing for the Wordle share grid.

- /quiz leaderboard [scope] subcommand — public embed, scopes are
  Today's Daily / This Week / All-Time
- daily_quiz_plays gains a guild_id column (auto-migrates existing rows
  on first init) and an index on (guild_id, user_id)
- Aggregations per user/guild: total correct (primary rank), perfect
  days ⭐ (tie-break), plays, current + best day streaks 🔥
- Streak math: "current" allows yesterday as leading edge so today's
  unplayed daily doesn't immediately reset the run, à la Wordle
- 11 new tests: streak edge cases (broken runs, yesterday-as-leading-
  edge, best vs current), guild scoping, scope filters, leaderboard
  embed empty-state + medal/badge rendering (64/64 total)
… puzzle

Two new admin-facing capabilities:

1. Per-server deck config
   - /quiz config show / allow-decks / default-deck (Manage Server)
   - Allow-list restricts what /quiz daily and /quiz start accept;
     empty list = open by default so existing servers aren't broken
   - default-deck lets /quiz daily fall back to a server-picked deck
     when no deck argument is given
   - New daily_quiz_guild_config table (guild_id PK)

2. Admin scheduling for tomorrow's daily
   - /quiz schedule [deck] (Manage Server) — fetches 5 candidate cards,
     posts an ephemeral preview with Shuffle / Use these / Cancel
   - Daily puzzle is now per-guild: daily_quiz_puzzles gains guild_id
     + scheduled_by columns and a COALESCE-based unique index so
     scheduled rows don't collide with the global cache row
   - In-memory draft store (cleared on save/cancel) carries the
     preview cards between Shuffle clicks
   - Schema migration drops the legacy puzzle table if it predates
     guild_id (it's only a fetch cache; safe to rebuild)

13 new tests: guild config CRUD + allow-list enforcement, schedule
upsert + per-guild isolation, schedule customId round-trip, preview
embed shape (78/78 total).
Adds an opt-in poll flow so members can vote on which deck the daily
puzzle pulls from tomorrow. Admin opens the poll, anyone in the channel
votes by clicking a deck button, admin closes it → the winning deck's
cards are auto-fetched and scheduled.

- /quiz vote (Manage Server) posts a public live-tally embed with one
  button per ballot deck + a Close button
- Ballot is the guild's allow-list (or all named decks if unrestricted),
  excluding the '' "all decks" sentinel
- Members can cast and re-cast their vote; tallies refresh in place
- Close (admin only) picks the highest-vote deck (tie-break: ballot
  order), fetches 5 cards, hands them to scheduleDailyPuzzle. Embed
  loses its buttons and gains a 🏆 winner block
- "No votes" close-case is handled — daily falls back to a random pull
- One open poll per (guild, target_date); message id is persisted so
  button handlers can refresh the public embed without in-memory state
- 13 new tests covering poll CRUD, vote upsert + close-blocking,
  tie-break determinism, customId round-trip, open/closed/empty embed
  shape (89/89 total)
…locks

Three compounding growth/network-effect vectors driven from existing data
(no new tables):

1. /quiz me [@user] — public profile card designed to be screenshotted
   off-platform. Surfaces total points, current/best streak, perfect
   days, win rate, last-7-days emoji grid, and earned badges. The
   command is the answer to "what bot is that?" inbound questions.

2. /quiz challenge @user [note] — public callout that tags the target
   and flexes the caller's most recent score grid. Includes a "Beat my
   score" CTA pointing at /quiz daily. Direct player acquisition loop.

3. Auto-achievement unlocks — after a player finishes a daily, the bot
   compares their pre/post stat snapshots and posts a celebratory
   message in channel tagging them and listing newly-earned badges.
   Social proof + return-to-play hook (others see badges, want their
   own).

Achievement set (8): First Blood, Perfect (5/5), On Fire (3-day streak),
Week Warrior (7-day), Habitual (30-day), Stellar (10 perfect days),
Quarter Century (25 lifetime), Centurion (100 lifetime). Predicates
are pure functions of stats so unlocks are deterministic and the
"newly earned" diff is computed from stats-before-vs-after.

Also fixes a latent bug: getMostRecentCompletedPlay was ordering by
completed_at which ties when timestamps match the same second; now
orders by puzzle date (with completed_at as secondary tie-break).

12 new tests covering stats aggregation, guild scoping, achievement
predicates and diffing, recent-play lookup, and all three new embeds
(101/101 total).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants