Release Notes

AI Commentary: Session-Aware Voice + Smarter Fight Dynamics

What Changed

Several rounds of improvements to how SparkyBot's AI commentary handles
session-level context, fight dynamics, and voice freshness. All shipped in
the same wave; group them however helps you remember.

1. Tiered hype on advantage wins

SparkyBot now understands the difference between a legendary outnumbered win
and a casual 2-to-1 curbstomp, and writes accordingly. Previously, the AI
treated any Decisive Win the same way regardless of the numbers matchup, so
a 35v50 miracle and a 50v18 havoc-party stomp could both produce "LEGENDARY",
"SURGICAL EXECUTION", "speedrun deletion" style commentary.

Now lopsided stomps get actively dismissive commentary. Comfortable wins
(1.3x to 2x squad advantage) get a lighter version of the same treatment.
Outnumbered wins still get full euphoric hype.

2. Player name rotation

SparkyBot tracks which squad players get named as the lead story in recent
responses and discourages the AI from naming the same player repeatedly
across a 2-hour window. Solves the "Formuele in 5 fights in a row" session
fatigue. Live data confirms Formuele's spotlight rate dropped from a 47%
baseline to roughly 17 to 24% across recent sessions, with the spotlight
spread across 10 to 12 different non-commander players per session.

The commander is always exempt. The outlier exception clause allows brief
credit to a flagged player if they are also the current fight's data outlier.

3. Topic / category rotation

Same idea as player rotation, applied to recurring criticism topics. The
prompt previously hit the same complaints (PUG behavior, stomp discipline,
boon denial, etc.) every fight that triggered them. Now SparkyBot tracks
which categories have been pushed into recent prompts and rotates over-used
ones out automatically.

Soft suppression: when a category gets rotated out, the underlying data
stays visible in the FIGHT ANALYSIS block, so the AI can still notice
organically if that category is the defining fact of the fight. The "must
mention this" directive is what rotates, not the data itself.

Threshold: a category fires in 3 of the last 5 fights and gets suppressed
on the next fight. Recovery happens naturally as new fights roll in.

4. Session streak awareness with tone escalation

SparkyBot now tracks the last several fight outcomes and adjusts its mood
based on the current win or loss streak. The voice progressively neutralizes
on long streaks:

2 to 3 win streak: ease off the euphoria slightly, this is not the first win of the night
4 to 5 win streak: rolling hard, do not celebrate the result, mock the enemy for walking into yet another loss
6+ win streak: getting boring, do not celebrate at all, mock the enemy's persistence in showing up to feed
2 to 3 loss streak: this is becoming a pattern, the anger should feel like a repeated warning
4+ loss streak: full tilt, something is structurally wrong tonight

When a streak breaks (loss after a win streak, or win after a loss streak),
SparkyBot now has a special tone for the transition. Win-streak-broken-by-loss
emits a "disappointed not full-tilt rageful, the squad was rolling and this
loss is a shift in energy" mood. Loss-streak-broken-by-win emits a "relief
more than euphoria, do not oversell the recovery" mood.

There is also a shape flavor in the session context line. If every fight in
a streak shares a quality, the streak gets a parenthetical tag. All-blowout
streaks read "(all blowouts, unimpressive)". All-comparable-or-outnumbered
streaks read "(all comparable or outnumbered, real quality)". Mixed streaks
get no flavor.

5. Fight shape labels

A new fight_shape field is now derived from the enemy/friendly ratio and
exposed alongside the existing outcome field. Categories: legendary_outnumbered,
comparable, comfortable, blowout. The outcome field is unchanged so
nothing downstream breaks. fight_shape is the new numbers-aware dimension.

6. Fight dynamics observations

Four new observation lines now appear in the FIGHT ANALYSIS block when the
data shape suggests something specific happened beyond the basic outcome:

Enemy respawn traffic: when squad kill events exceed unique enemy count, meaning enemies died, respawned, and returned to die again. Implies a short waypoint or a long enough fight to matter.
Enemy rez/rally chain: when squad downs exceed unique enemy count but kills don't, meaning enemies kept going down and getting picked up rather than dying.
Squad runbacks: mirror of respawn traffic, applied to our side. When our death events exceed our unique squad count, meaning our players died and ran back.
Squad resilience: when our downs received exceeded our deaths, meaning the support line was rezzing well or the enemy could not convert downs to kills. Threshold tuned to require at least 10 downs received and 5 saved to fire, so trivial fights don't trigger noise.

These are informational observations, not mandatory callouts. They live in
the neutral analysis block and the AI uses them when they are the most
dramatic story available.

7. Enemy comp commentary gated to losses

The single most impactful change in this version. Previously, the AI received
full enemy composition data (breakdown by profession, top enemy skills, and a
narrative strategy fingerprint) on every fight. The result: the model led with
enemy comp descriptions on 91% of wins, producing formulaic "that static
meteor-and-lava comp got dismantled" commentary fight after fight.

Now all enemy comp data is stripped from the prompt on wins. The model cannot
see enemy_breakdown or top_enemy_skills in the FIGHT DATA JSON, the strategy
fingerprint line does not appear in FIGHT ANALYSIS, and the enemy_comp_failure
callout does not fire. On losses, all data is preserved so the model can
explain what the enemy did right.

Siege detection is the exception: siege weapons are mockery-worthy regardless
of outcome, so the SIEGE DETECTED observation fires on wins via a separate
code path.

Live testing across Grok 4.20, GPT-5.4, and Gemini 2.5 Flash: enemy comp
references on wins dropped from 91% to 0%.

8. Siege detection overhaul

The previous siege detection used a hardcoded set of 5 skill names including
"Mortar Shot", which is the Engineer Mortar Kit auto-attack, not a siege
weapon. This caused false-positive "SIEGE DETECTED" callouts on fights with
no actual siege. The detection set was also missing Catapults and Flame Rams
entirely.

Replaced with a proper categorized frozenset of ~25 skills covering all 6
siege weapon types, plus substring fallbacks for naming variants. A dedicated
_is_siege_skill() helper replaces all inline set checks.

9. Vocabulary freshness: three detection layers

The n-gram phrase tracker from the previous version now has two additional
detection layers to catch repetition patterns it missed:

Fixation verbs: words like "shredded", "vaporized", "dismantled" that
models latch onto regardless of what comes after them. The n-gram tracker
missed these because "shredded every boon" and "shredded their stability"
are different 3-grams. Now the verb itself is tracked and banned after 3+
uses.

Word frequency tracker: counts every non-stopword, non-domain word across
recent responses. Any word appearing in 6 or more of the last 8 responses
gets banned dynamically. This catches model-specific fixations without manual
curation. Player name components are excluded via accent-normalized matching
so "Bálls" doesn't get "balls" banned.

Live result: "shredded" usage dropped from 89% to 25% across a 12-fight
session.

10. Commander mention variety

The commander was previously blanket-exempt from player name suppression,
resulting in commander mentions in 89% of responses. Now after 3+ mentions
in the 2-hour window, the commander faces a 50% dice roll per fight. Live
result: commander mentions dropped to approximately 50%.

11. PUG commentary controls

Two changes to prevent PUG-blame from becoming default filler:

PUG saturation tracking: when PUGs have been mentioned in 3 or more
recent responses, the model is directed to skip PUG commentary entirely
regardless of the data threshold.

Outnumbered loss mood rewrite: loss mood directives no longer blame PUGs
when the squad was outnumbered. The model is told to acknowledge the matchup
was unfavorable and credit anything the squad did well despite it.

12. max_tokens raised to 8000

Default max_tokens raised from 1000 to 8000 to support reasoning models
(GLM 5.1, GPT-5.3 Codex) where internal thinking consumes most of the token
budget before emitting visible output. Zero cost impact on non-reasoning
models (they stop at finish_reason=stop well before any limit).

What You'll Notice (1.6.7 additions)

Enemy comp is gone on wins. The model now talks about what the squad
did, not what the enemy ran. On losses, the enemy comp analysis is still
there to explain what went wrong.

"Shredded" fatigue is gone. The verb frequency tracker catches model
fixations dynamically. Different models fixate on different words; the
system adapts without manual word-ban curation.

Commander isn't in every post. The dice roll gives the commander name
a natural frequency of roughly every other fight instead of every fight.

Siege callouts are accurate. Engineer Mortar Kit no longer triggers
false siege detection. Real siege weapons (including catapults, which were
previously missing) are properly detected.

Reasoning models work. The max_tokens increase lets GLM 5.1 and GPT-5.3
Codex produce full responses instead of truncating mid-sentence.

What You'll Notice (1.6.6 items) Across testing on a 17-fight test

set, the same logs run multiple times produce visibly different commentary
across the session arc:

The first loss after a streak feels like a gut punch instead of routine anger
Long win streaks shift into bored-and-mocking voice instead of repeating euphoria
Long fights get observations about respawners and runbacks that the AI
never had access to before
Player names rotate naturally across the session instead of locking onto
one carry

Voice variation by model. The voice quality of the output is now
strongly model-dependent. Models with strong inherent voice (Grok 4.20 in
particular) produce the intended unhinged commentary register. Models that
default to safer or more measured output (gpt-5.4-mini observed in live
testing) produce technically compliant but voice-flat responses with
metaphor-stacking and no emotional range. This is a model selection issue,
not a prompt issue, and is expected.

Lopsided stomps still feel small. Item #1's tiered mood plus palette
filtering means a 50v18 curbstomp will not produce the same hype as a 35v50
miracle. TTS will be quieter on these fights, which is correct.

Your supports get credit. The new resilience observation gives the
support line a story it could not previously be given: "the squad was
downed N times but only died M times, the support line was working".

What Stays The Same

All shock exclamations remain available in every fight type. Sarcastic
openers on bad enemy comps still work.
The a free win palette term remains available on lopsided stomps because
it literally describes the situation.
ALL CAPS freestyle closers are untouched. The AI can still invent its own
hype language. TTS playback remains driven by ALL CAPS emphasis.
The outcome field on every fight is unchanged. Discord embeds, Twitch
posts, history logs all see the same labels they always did.

For AI Model Testers

The user message now includes structured sections in this order: FIGHT ANALYSIS (neutral data), TONE (mood directive), MANDATORY CALLOUTS (when present), AVAILABLE TERMS, style/opener/player/streak guidance, FIGHT DATA JSON.
A Session context: ... line may appear inside FIGHT ANALYSIS when there is a streak of length 2+. Format: Session context: on a N-fight win streak (optional shape flavor) or Session context: this fight ends a N-fight win streak.
A Session note: ... sentence may be appended to the TONE line when streak-based mood escalation applies.
A RECENT PLAYER MENTIONS block may appear listing squad players to avoid naming. Shows after the 3-mention threshold is crossed.
The MANDATORY CALLOUTS section is omitted entirely when no callouts survive topic suppression. TONE is still emitted in that case.
New summary fields available in FIGHT DATA: fight_shape, fight_shape_ratio, squad_downs_received.
Set SPARKY_DEBUG_AI_PROMPT=1 to dump prompts as before. Each prompt dump is a full JSON capture of system message, user message, and response.

Persistence Files

Two state files now live alongside the existing sparkybot_vocab_usage.json:

sparkybot_vocab_usage.json: vocabulary terms, opener strategies, player mentions, topic categories. Owned by VocabularyTracker.
sparkybot_session_history.json: ordered fight outcomes for streak detection. Owned by SessionHistoryTracker. Cap 50 entries.

Both files are JSON and safe to delete to reset state. Both are
backward-compatible: older formats missing newer keys load cleanly.

Coming Soon

Honestly, the project is in a good place and the obvious additive work is
shipped. The remaining items in the backlog are either dropped (see below)
or speculative. If you want to add something, the most likely candidate is
a new fight dynamics observation in the same shape as item #7's four
existing ones. Open the changelog to see how those four are wired in.

What Got Dropped

Several items that looked promising in early analysis turned out to be
fixing sentence-level issues that don't matter at session scale, or
chasing data that does not exist. These were cut from the roadmap rather
than shipped:

Banned construction promotion: the "turned X into Y" pattern fires a lot in Grok responses, but some of those lines are the best in the dataset ("turned enemy reapers into sad ghosts"). Killing the construction would kill the peak moments.
Tier 2 mood string tightening: marginal improvement.
Shock term directionality rule: 2 instances of COME ON GUYS landing on a winning fight across 50+ dumps is noise.
Post-generation retry loop: would catch individual rule violations at the cost of latency, money, and complexity for diminishing returns.
Comparative US vs THEM stats: would have surfaced enemy strips, cleanses, healing as comparative ratios. Investigation showed ArcDPS logs cannot observe enemy self-support: the only enemy-side actions visible are the ones that touch our squad (damage outbound, conditions and strips inbound). Inbound-only signals were judged not worth the implementation lift.

Philosophy: voice-driven commentary cannot be perfected at the sentence
level without killing the voice. Every rule added, every banned word list,
every retry layer slightly dampens the personality the model is supposed
to bring. The fixes that survive are the ones a human can hear across a
full session, not the ones a regex can find in a single response.

Full Changelog: v1.6.6...v1.6.7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.6.7

Choose a tag to compare

Sorry, something went wrong.