fix(chat): WebSocket reliability, scroll race fix, and UX improvements#86
fix(chat): WebSocket reliability, scroll race fix, and UX improvements#86dvlexp wants to merge 4 commits into
Conversation
websocket-client>=1.9 required by local skills/tools. Lock regenerated after merging origin/main; pulls in flask-limiter (rate-limit on public share endpoint, evolution-foundation#52), limits, wrapt, ordered-set, deprecated. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
terminal_proxy.py fechava a ponte WS quando client_ws.receive(timeout=30) retornava None — mas None é timeout, não desconexão (disconexão real levanta ConnectionClosed). Em abas em background o ping de 25s do AgentChat é throttled pelo navegador para <1×/min, batendo o timeout e derrubando o WS com o peer ainda vivo. Como o frontend não tinha auto-reconnect, wsRef.current ficava apontando para um socket fechado e sendMessage() retornava em silêncio — sintoma do "clico enviar e nada acontece" relatado pelos usuários. Backend: continue em vez de break no receive timeout. Disconexões reais seguem caindo no except → finally normalmente. Frontend: useEffect refatorado para função connect() reusável, com backoff 1s→30s acionado no onclose. Ping interval virou per-WS (localPing) para que o onclose de uma WS antiga não limpe o ping da nova durante reconnects encadeados. wsRef agora é zerado no onclose para evitar sends em socket morto. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Antes: scrollToBottom() forçava scrollTop = scrollHeight em todo render/ delta, então o usuário não conseguia rolar pra cima pra ler histórico — a próxima mensagem (ou cada chunk de stream) jogava ele de volta no fundo. Agora: isAtBottomRef rastreia se o usuário está perto do fundo (<50px). Se ele rolou pra cima, scrollToBottom() vira no-op e aparece um botão flutuante "Ir para o final" que reativa o follow ao clicar. Mandar nova mensagem ou re-enviar (edit/rewind) força a flag de volta a true porque é sinal claro de "quero ver o resultado". History restore (session_joined, chat_history) também força true — abrindo a sessão você sempre cai no fundo. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…nder
3 unrelated reliability fixes accumulated in AgentChat.tsx:
1. Scroll race: under heavy streaming (60+ deltas/sec) the scrollToBottom
rAF could fire before the user's onScroll propagated, leaving
isAtBottomRef stale and teleporting them back to bottom. Now we
recompute distance-from-bottom inside the rAF callback and update the
flag proactively if the user moved away.
2. Heartbeat timeout: track lastPongAt on every server pong. The ws lib
can leave half-open sockets (TCP dead, no close frame), so onclose
never fires and chat_event stops silently. The interval now also
checks for stale pongs and forces a reconnect.
3. Defensive blocks render: msg.blocks could be undefined for assistant
messages mid-stream or from legacy formats, crashing
.map/.some/.filter and triggering the ErrorBoundary ("Unable to load
dashboard section"). Fixed in 3 sites: getMessageText (line 771),
blocks rendering (1278), streaming hasVisibleContent check (1292).
Bug 3 was breaking the entire /agents/<name> page for every agent with
chat history — reported today on clawdia-assistant but not specific.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Reviewer's GuideImplements WebSocket keepalive and auto-reconnect for AgentChat, refines scroll behavior to respect manual scrolling with a jump-to-bottom control, and hardens block rendering and backend WS proxy behavior to avoid crashes and silent disconnects, plus adds the websocket-client dependency. Sequence diagram for AgentChat WebSocket keepalive and auto-reconnectsequenceDiagram
actor User
participant AgentChat
participant BrowserWS as WebSocket
participant TerminalProxy
participant Upstream as UpstreamServer
User->>AgentChat: open AgentChat(sessionId)
activate AgentChat
AgentChat->>AgentChat: connect()
AgentChat->>TerminalProxy: requests.get(TS_HTTP + /health)
TerminalProxy-->>AgentChat: 200 OK
AgentChat->>BrowserWS: new WebSocket(TS_WS + /ws)
BrowserWS-->>AgentChat: onopen
AgentChat->>BrowserWS: ws.send({ type: join_session, sessionId })
AgentChat->>AgentChat: reconnectDelayRef = 1000
AgentChat->>AgentChat: setInterval(localPing, 25000)
loop keepalive
AgentChat->>BrowserWS: ws.send({ type: ping })
BrowserWS->>TerminalProxy: ping
TerminalProxy->>Upstream: forward ping
Upstream-->>TerminalProxy: pong
TerminalProxy-->>BrowserWS: pong
BrowserWS-->>AgentChat: message type=pong
AgentChat->>AgentChat: lastPongAt = Date.now()
end
note over AgentChat,BrowserWS: Heartbeat timeout
AgentChat->>AgentChat: [Date.now() - lastPongAt > 60000]
AgentChat->>BrowserWS: ws.close()
BrowserWS-->>AgentChat: onclose
AgentChat->>AgentChat: clearInterval(localPing)
AgentChat->>AgentChat: scheduleReconnect()
AgentChat->>AgentChat: setTimeout(connect, reconnectDelayRef)
AgentChat->>AgentChat: reconnectDelayRef = min(reconnectDelayRef * 2, 30000)
AgentChat->>AgentChat: connect() // new WebSocket
deactivate AgentChat
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- The floating "Ir para o final" button is rendered as a sibling of the scroll container but uses absolute positioning; consider nesting it inside the relatively positioned scroll container (or adding a dedicated positioned wrapper) so its placement is reliably anchored to the chat area rather than the page.
- New user-facing strings and aria-labels for the jump-to-bottom button are in Portuguese while the rest of the component appears English; consider aligning language for consistency and accessibility.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The floating "Ir para o final" button is rendered as a sibling of the scroll container but uses absolute positioning; consider nesting it inside the relatively positioned scroll container (or adding a dedicated positioned wrapper) so its placement is reliably anchored to the chat area rather than the page.
- New user-facing strings and aria-labels for the jump-to-bottom button are in Portuguese while the rest of the component appears English; consider aligning language for consistency and accessibility.
## Individual Comments
### Comment 1
<location path="dashboard/frontend/src/components/AgentChat.tsx" line_range="1010-1018" />
<code_context>
const isConnecting = externalLoading || status === 'connecting'
const effectiveError = externalError || (status === 'error' ? errorMsg : null)
- const inputDisabled = isConnecting || !!effectiveError
+ // Don't disable the textarea during transient reconnects — disabling blurs
+ // the cursor (native behavior of <textarea disabled>), which forces the user
+ // to click back in after every WS hiccup. canSend already gates the Send
+ // button on readyState === OPEN, so typing while the socket is down is safe:
+ // the text stays in React state and gets sent the moment the WS reopens.
+ // Only hard errors (server unreachable) still disable input.
+ const inputDisabled = !!effectiveError
const canSend = (input.trim().length > 0 || attachedFiles.length > 0) && !inputDisabled && status !== 'running'
</code_context>
<issue_to_address>
**nitpick:** `isConnecting` is now unused and the new `inputDisabled` behavior might surprise future readers.
Since `isConnecting` is no longer used (and not referenced elsewhere), please remove it to avoid confusion. Also consider a brief inline comment or a more specific name for `inputDisabled` to clarify that it now reflects only error state, not connecting state, so future changes don’t accidentally reintroduce the old behavior.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| const isConnecting = externalLoading || status === 'connecting' | ||
| const effectiveError = externalError || (status === 'error' ? errorMsg : null) | ||
| const inputDisabled = isConnecting || !!effectiveError | ||
| // Don't disable the textarea during transient reconnects — disabling blurs | ||
| // the cursor (native behavior of <textarea disabled>), which forces the user | ||
| // to click back in after every WS hiccup. canSend already gates the Send | ||
| // button on readyState === OPEN, so typing while the socket is down is safe: | ||
| // the text stays in React state and gets sent the moment the WS reopens. | ||
| // Only hard errors (server unreachable) still disable input. | ||
| const inputDisabled = !!effectiveError |
There was a problem hiding this comment.
nitpick: isConnecting is now unused and the new inputDisabled behavior might surprise future readers.
Since isConnecting is no longer used (and not referenced elsewhere), please remove it to avoid confusion. Also consider a brief inline comment or a more specific name for inputDisabled to clarify that it now reflects only error state, not connecting state, so future changes don’t accidentally reintroduce the old behavior.
Problem
Three independent chat stability issues affecting production use:
blocks—AgentChatrendered blocks unconditionally, crashing whenblockswas absent from a messageSolution
websocket-client>=1.9added to deps (pyproject.toml+uv.lock) — required by the keepalive implementationAgentChat— sends a ping every 20s, resets on any incoming message; detects half-open connections before they cause user-visible failuresisAtBottomstate; auto-scroll only triggers when the user is already at the bottom; adds "scroll to bottom" button when pinned elsewhereblocksrender — guards all block rendering with null checks; no more crashes on messages with missing/undefined structureTest Plan
blocksfield — no crash, graceful renderCommits
3dd0acf6d512609c6a9a481dcebeSummary by Sourcery
Improve AgentChat WebSocket reliability and chat scrolling UX while hardening message rendering against missing blocks.
New Features:
Bug Fixes:
Enhancements:
Build: