Skip to content

fix(flue): switch kimi automations to k2.7-code and handle 429 capacity gracefully#1490

Merged
ascorbic merged 1 commit into
mainfrom
fix/flue-kimi-429-handling
Jun 15, 2026
Merged

fix(flue): switch kimi automations to k2.7-code and handle 429 capacity gracefully#1490
ascorbic merged 1 commit into
mainfrom
fix/flue-kimi-429-handling

Conversation

@ascorbic

Copy link
Copy Markdown
Collaborator

What does this PR do?

Fixes the regression where the automated PR reviewer (emdash-flue-review) stopped posting reviews around #1484, with no deploy and no error in logs.

Root cause: Workers AI returns HTTP 429 when a model is over capacity. We confirmed sustained 429s on @cf/moonshotai/kimi-k2.6 (and, to a lesser extent, 2.7). Under load the AI binding can hold a request open indefinitely, so the review workflow's session.skill call never returned: no result, no posted review, just the Sandbox container's keep-alive alarm firing for minutes. There was no timeout or capacity handling anywhere, so a transient capacity spike turned into a permanent silent hang.

Changes:

  • Move every kimi usage to kimi-k2.7-code (less loaded right now): the review agent, the fix agent, both reply classifiers, and the investigate classifier. (/bonk//review kimi alias already moved in chore: bump flue/bonk coding agents to kimi-k2.7-code #1485.)
  • Add withCapacityRetry (one copy per flue deploy unit, since .flue and infra/flue-review are independent workspaces): bounds each model call with a hard per-attempt timeout so a stalled call fails loudly instead of hanging, and retries genuine 429 capacity errors with exponential backoff + full jitter. Per-attempt timeouts are intentionally not retried (can't distinguish a stall from slow-but-working progress) — they fail loud and bounded, and the workflow's at-least-once restart handles re-running.
  • Apply it to the flue-review review skill and to every model-bearing stage of the investigate/classify workflows.

Note: ModelConfig in @flue/runtime is just a model-id string, so there's no provider-level retry/timeout knob — this is the application-level contract.

Verified by live wrangler tail of emdash-flue-review: the pipeline is healthy through git checkout and git diff, then goes silent at the (kimi) inference call with zero exceptions — consistent with a held-open 429.

Closes #

Type of change

  • Bug fix
  • Feature (requires maintainer-approved Discussion)
  • Refactor (no behavior change)
  • Translation
  • Documentation
  • Performance improvement
  • Tests
  • Chore (dependencies, CI, tooling)

Checklist

  • I have read CONTRIBUTING.md
  • pnpm typecheck passes
  • pnpm lint passes
  • pnpm test passes (or targeted tests for my change)
  • pnpm format has been run
  • I have added/updated tests for my changes (if applicable)
  • User-visible strings in the admin UI are wrapped for translation (if applicable)
  • I have added a changeset (if this PR changes a published package)
  • New features link to an approved Discussion

AI-generated code disclosure

  • This PR includes AI-generated code — model/tool: Claude Opus 4.8 (opencode)

Screenshots / test output

tsc --noEmit clean in infra/flue-review (after wrangler types) and .flue; oxfmt clean on all changed files.

…city gracefully

Workers AI returns 429 when a model is over capacity, and under sustained
load the binding can hold a request open indefinitely. That left the
deployed review workflow hung forever on a stalled inference call: no
result, no posted review, just the container's keep-alive alarm firing
for minutes (the cause of reviews silently not posting).

- Switch every kimi usage (review, fix, both classifiers, investigate
  classifier) to kimi-k2.7-code, which is currently less loaded.
- Add withCapacityRetry: bounds each model call with a hard per-attempt
  timeout (fails loudly instead of hanging) and retries genuine 429
  capacity errors with exponential backoff + full jitter.
- Apply it to the flue-review skill call and to all model-bearing stages
  of the investigate/classify workflows.
@changeset-bot

changeset-bot Bot commented Jun 15, 2026

Copy link
Copy Markdown

⚠️ No Changeset found

Latest commit: d5fe0ff

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@github-actions github-actions Bot added the review/needs-review No maintainer or bot review yet label Jun 15, 2026
@github-actions

Copy link
Copy Markdown
Contributor

Scope check

This PR changes 553 lines across 8 files. Large PRs are harder to review and more likely to be closed without review.

If this scope is intentional, no action needed. A maintainer will review it. If not, please consider splitting this into smaller PRs.

See CONTRIBUTING.md for contribution guidelines.

@pkg-pr-new

pkg-pr-new Bot commented Jun 15, 2026

Copy link
Copy Markdown

Open in StackBlitz

@emdash-cms/admin

npm i https://pkg.pr.new/@emdash-cms/admin@1490

@emdash-cms/auth

npm i https://pkg.pr.new/@emdash-cms/auth@1490

@emdash-cms/auth-atproto

npm i https://pkg.pr.new/@emdash-cms/auth-atproto@1490

@emdash-cms/blocks

npm i https://pkg.pr.new/@emdash-cms/blocks@1490

@emdash-cms/cloudflare

npm i https://pkg.pr.new/@emdash-cms/cloudflare@1490

@emdash-cms/contentful-to-portable-text

npm i https://pkg.pr.new/@emdash-cms/contentful-to-portable-text@1490

emdash

npm i https://pkg.pr.new/emdash@1490

create-emdash

npm i https://pkg.pr.new/create-emdash@1490

@emdash-cms/gutenberg-to-portable-text

npm i https://pkg.pr.new/@emdash-cms/gutenberg-to-portable-text@1490

@emdash-cms/plugin-cli

npm i https://pkg.pr.new/@emdash-cms/plugin-cli@1490

@emdash-cms/plugin-types

npm i https://pkg.pr.new/@emdash-cms/plugin-types@1490

@emdash-cms/registry-client

npm i https://pkg.pr.new/@emdash-cms/registry-client@1490

@emdash-cms/registry-lexicons

npm i https://pkg.pr.new/@emdash-cms/registry-lexicons@1490

@emdash-cms/sandbox-workerd

npm i https://pkg.pr.new/@emdash-cms/sandbox-workerd@1490

@emdash-cms/x402

npm i https://pkg.pr.new/@emdash-cms/x402@1490

@emdash-cms/plugin-ai-moderation

npm i https://pkg.pr.new/@emdash-cms/plugin-ai-moderation@1490

@emdash-cms/plugin-atproto

npm i https://pkg.pr.new/@emdash-cms/plugin-atproto@1490

@emdash-cms/plugin-audit-log

npm i https://pkg.pr.new/@emdash-cms/plugin-audit-log@1490

@emdash-cms/plugin-color

npm i https://pkg.pr.new/@emdash-cms/plugin-color@1490

@emdash-cms/plugin-embeds

npm i https://pkg.pr.new/@emdash-cms/plugin-embeds@1490

@emdash-cms/plugin-field-kit

npm i https://pkg.pr.new/@emdash-cms/plugin-field-kit@1490

@emdash-cms/plugin-forms

npm i https://pkg.pr.new/@emdash-cms/plugin-forms@1490

@emdash-cms/plugin-webhook-notifier

npm i https://pkg.pr.new/@emdash-cms/plugin-webhook-notifier@1490

commit: d5fe0ff

@ascorbic ascorbic merged commit eb5f7e4 into main Jun 15, 2026
45 checks passed
@ascorbic ascorbic deleted the fix/flue-kimi-429-handling branch June 15, 2026 16:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

review/needs-review No maintainer or bot review yet size/XL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant