Commit 6493988
fix(quiz): tighten AdaptiveDifficultyEvaluator + clarify eval semantics
Closes the two issues from PR #77's second review.
- AdaptiveDifficultyEvaluator now scores per-question (fraction
compliant) instead of average rank. The prompt rule is per-question
("Never override the user-requested difficulty by more than one
step"); the previous avg-based check let a 2-step outlier slip
through if the rest of the mix balanced it out — e.g. requested
hard with [easy, hard, hard, hard] used to score 1.0 (avg=1.5,
within 1 of target=2) and now correctly scores 0.75. Switched to
subscript _DIFF_RANK[q.difficulty] (Literal-constrained, fallback
was unreachable).
- The two new ADR-0014 cases now carry NOTE comments explaining that
recency/staleness state is baked into the user message for replay
determinism, while production sources it via read_recent_quiz_attempts.
The cases pin the prompt's rule application; live-mode evals are
the right place to catch tool-wiring regressions.
Smoke-tested: hard + [easy, hard, hard, hard] -> 0.75; hard + all
medium -> 1.0 (allowed shift); hard + all easy -> 0.0 (overshoot).
46 quiz unit tests still pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>1 parent 801980d commit 6493988
1 file changed
Lines changed: 24 additions & 15 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
98 | 98 | | |
99 | 99 | | |
100 | 100 | | |
101 | | - | |
102 | | - | |
103 | | - | |
104 | | - | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
105 | 104 | | |
106 | 105 | | |
107 | | - | |
108 | | - | |
109 | | - | |
110 | | - | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
111 | 110 | | |
112 | 111 | | |
113 | 112 | | |
| |||
117 | 116 | | |
118 | 117 | | |
119 | 118 | | |
120 | | - | |
121 | | - | |
122 | | - | |
123 | | - | |
124 | | - | |
125 | | - | |
126 | | - | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
127 | 125 | | |
128 | 126 | | |
129 | 127 | | |
| |||
294 | 292 | | |
295 | 293 | | |
296 | 294 | | |
| 295 | + | |
| 296 | + | |
| 297 | + | |
| 298 | + | |
| 299 | + | |
297 | 300 | | |
298 | 301 | | |
299 | 302 | | |
| |||
321 | 324 | | |
322 | 325 | | |
323 | 326 | | |
| 327 | + | |
| 328 | + | |
| 329 | + | |
| 330 | + | |
| 331 | + | |
| 332 | + | |
324 | 333 | | |
325 | 334 | | |
326 | 335 | | |
| |||
0 commit comments