Skip to content

fix: throttle agent detection loop against detect_reset notify storms#4

Draft
wkentaro wants to merge 1 commit into
masterfrom
fix/agent-detection-cpu-spin
Draft

fix: throttle agent detection loop against detect_reset notify storms#4
wkentaro wants to merge 1 commit into
masterfrom
fix/agent-detection-cpu-spin

Conversation

@wkentaro

Copy link
Copy Markdown
Owner

This was generated by AI.

Summary

Throttle the per-pane agent-detection loop so a burst of detect_reset notifications can't drive it into a high-CPU spin.

Why

Each pane runs an agent-detection task shaped like:

tokio::select! {
    _ = sleep(tick) => {}
    _ = detect_reset.notified() => {}
}

detect_reset fires whenever the pane's foreground process group changes. An agent that rapidly toggles its foreground pgid produces a storm of notifications, each of which wakes the loop immediately and bypasses the intended sleep tick. The detection work then runs back-to-back with no spacing, spinning the task at high CPU (and adding load that can make the UI lag).

Fix

Add a minimum interval between detection iterations: if the previous iteration finished less than DETECTION_LOOP_MIN_INTERVAL ago, sleep the remainder before running again. A detect_reset storm now collapses into at most one iteration per interval instead of an unbounded spin. Covered by unit tests for the throttle helper plus an integration test that fires a detect_reset storm and asserts the loop stays throttled.

Scope

Independent fix, unrelated to the ui.pane_borders = "minimal" feature (#1). Related in spirit to the 0.6.10 detection-reset-loop fix (ogulcancelik#560, ogulcancelik#565) but a distinct path (the notify storm bypassing the loop's own throttle).

The per-pane agent detection loop wakes on its normal 300-500ms tick or
when detect_reset is notified. tokio::Notify coalesces to a single stored
permit, so a sustained notify source (full-lifecycle authority re-asserted
on every app event, repeated report_agent_session, or agent-state churn
across many panes) let the loop re-probe /proc and re-emit with no sleep,
pegging a CPU core per affected pane and starving the single-threaded
server loop so client handshakes timed out and the session froze.

Bound both detection loops to a 50ms minimum interval between work
iterations so a detect_reset storm cannot drive them faster than the
fastest legitimate tick, while keeping reset responsiveness well under the
normal tick.
@wkentaro wkentaro self-assigned this Jun 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant