feat: add chat UI, status HUD, and audio transcription#137
Conversation
- Enable InputAudioTranscription and OutputAudioTranscription in Live API configs for both onboarding and reunion sessions - Forward user/model speech transcripts from Go backend to browser - Add ChatPanel component with streaming message bubbles (model left, user right) - Add StatusHUD component (top-left) showing connection, session state, mic/speaking - Add ActionsHUD component (top-right) showing available actions per session state - Replace old connection indicator and transcript overlay with new HUD components - Handle partial/streaming transcription with pending message accumulation Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary of ChangesHello @ComBba, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the user experience for real-time voice conversations by integrating advanced UI components and enabling comprehensive audio transcription. It transitions from a basic transcript overlay to a dynamic chat interface and introduces dedicated heads-up displays for connection status, session state, and available actions, making the interaction more intuitive and informative for users. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (2)
Walkthrough오디오 트랜스크립션 처리를 위한 백엔드 지원이 추가되었으며, 프론트엔드에서는 채팅 메시지 스트리밍 시스템으로 트랜스크립트를 처리합니다. 새로운 UI 컴포넌트(ChatPanel, StatusHUD, ActionsHUD)가 도입되어 사용자 인터페이스가 개선되었습니다. Changes
Sequence Diagram(s)sequenceDiagram
participant Backend as Backend<br/>(proxy.go)
participant WebSocket as WebSocket
participant Page as Page<br/>(page.tsx)
participant ChatPanel as ChatPanel<br/>Component
Backend->>WebSocket: 트랜스크립션 메시지<br/>(InputTranscription/OutputTranscription)
WebSocket->>Page: ServerMessage<br/>type: 'transcript'<br/>finished: boolean
Page->>Page: chatMessages 상태 업데이트<br/>부분 텍스트 누적
Page->>Page: finished=true일 때<br/>완성된 메시지 생성
Page->>ChatPanel: chatMessages 전달
ChatPanel->>ChatPanel: 역할별로<br/>메시지 렌더링
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces significant UI enhancements by adding a real-time chat panel, a status HUD, and an actions HUD. The backend is updated to enable and forward audio transcriptions from the Gemini Live API, which powers the new chat interface. The frontend logic for handling streaming and finalized transcript messages is well-implemented, though there is a small bug in the state update logic that could leave stale messages on the screen. Additionally, on the backend, there appears to be a duplication in how model transcripts are sent to the client, which could result in duplicate messages. My review includes suggestions to address these two high-severity issues. Overall, this is a great feature addition that dramatically improves the user experience.
| if content.OutputTranscription != nil && content.OutputTranscription.Text != "" { | ||
| p.sendJSON(map[string]any{ | ||
| "type": "transcript", | ||
| "role": "model", | ||
| "text": content.OutputTranscription.Text, | ||
| "finished": content.OutputTranscription.Finished, | ||
| }) | ||
| } |
There was a problem hiding this comment.
This new block for forwarding the model's output transcription appears to duplicate existing logic. The loop over content.ModelTurn.Parts on lines 271-286 already sends a transcript message with the model's text. This will likely result in duplicate chat messages being displayed in the UI. Since this new OutputTranscription path correctly provides the finished flag, which the new UI logic relies on, the sendJSON call within the ModelTurn loop should probably be removed to resolve the duplication.
| if (finished) { | ||
| // Finalize: flush pending partial text into a completed message. | ||
| const pending = pendingMsgRef.current[role]; | ||
| const finalText = pending ? pending + text : text; | ||
| if (finalText) { | ||
| const id = String(msgIdRef.current++); | ||
| setChatMessages((prev) => { | ||
| // Remove the in-progress placeholder for this role if present. | ||
| const cleaned = prev.filter( | ||
| (m) => !(m.role === role && !m.finished), | ||
| ); | ||
| return [...cleaned, { id, role, text: finalText, finished: true }]; | ||
| }); | ||
| } | ||
| pendingMsgRef.current[role] = null; | ||
| } else { |
There was a problem hiding this comment.
There's a potential bug in how finished transcript messages are handled. If a finished: true message arrives and the resulting finalText is empty, the if (finalText) condition prevents setChatMessages from being called. This means an in-progress message for that role could get stuck on the screen, as it's never cleared. The logic should be restructured to ensure the pending message is always removed when a finished message is processed, regardless of whether the final text is empty.
if (finished) {
// Finalize: flush pending partial text into a completed message.
const pending = pendingMsgRef.current[role];
const finalText = pending ? pending + text : text;
if (finalText) {
const id = String(msgIdRef.current++);
setChatMessages((prev) => {
// Remove the in-progress placeholder for this role if present.
const cleaned = prev.filter(
(m) => !(m.role === role && !m.finished),
);
return [...cleaned, { id, role, text: finalText, finished: true }];
});
} else {
// If final text is empty, just remove the pending message from the UI.
setChatMessages((prev) =>
prev.filter((m) => !(m.role === role && !m.finished)),
);
}
pendingMsgRef.current[role] = null;
}
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 70f963ad2d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if content.OutputTranscription != nil && content.OutputTranscription.Text != "" { | ||
| p.sendJSON(map[string]any{ | ||
| "type": "transcript", | ||
| "role": "model", | ||
| "text": content.OutputTranscription.Text, | ||
| "finished": content.OutputTranscription.Finished, |
There was a problem hiding this comment.
Stop sending duplicate model transcripts to the client
This new branch emits a second transcript stream for role: "model" even though handleServerContent already emits model text from ModelTurn.Parts above. When OutputAudioTranscription is enabled, Live messages can include both sources for the same utterance, and web/app/page.tsx currently merges chunks by role into a single pending message, which leads to duplicated/garbled chat text in the HUD. Use one canonical model transcript source (or a distinct message type) so the frontend does not interleave two model streams.
Useful? React with 👍 / 👎.
| if content.InputTranscription != nil && content.InputTranscription.Text != "" { | ||
| p.toolHandler.AddTranscript("user", content.InputTranscription.Text) | ||
| p.sendJSON(map[string]any{ |
There was a problem hiding this comment.
Persist user transcription only when chunk is finished
AddTranscript("user", ...) runs for every InputTranscription update, including partial chunks (Finished == false). Because analyze_user uses only a bounded recent transcript buffer, long in-progress utterances can fill the buffer with fragments and evict real prior turns, degrading analysis quality. Coalesce partial input transcription updates and persist only completed user turns to the transcript store.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
web/hooks/useWebSocket.ts (1)
29-37:⚠️ Potential issue | 🟠 Major연결 종료 후 자동 재연결이 없어 세션 복원성이 떨어집니다.
onclose에서 상태만disconnected로 바꾸고 종료되어, 일시적 네트워크 단절 후 사용자가 수동으로 다시 시작해야 합니다.🔁 제안 패치
export function useWebSocket(url: string, onMessage: (msg: ServerMessage) => void) { const wsRef = useRef<WebSocket | null>(null); + const reconnectTimerRef = useRef<ReturnType<typeof setTimeout> | null>(null); + const shouldReconnectRef = useRef(true); const [state, setState] = useState<ConnectionState>('disconnected'); const connect = useCallback(() => { setState('connecting'); const ws = new WebSocket(url); ws.binaryType = 'arraybuffer'; ws.onopen = () => setState('connected'); - ws.onclose = () => setState('disconnected'); + ws.onclose = () => { + setState('disconnected'); + if (shouldReconnectRef.current) { + reconnectTimerRef.current = setTimeout(() => connect(), 1000); + } + }; ws.onerror = () => setState('error'); @@ const disconnect = useCallback(() => { + shouldReconnectRef.current = false; + if (reconnectTimerRef.current) { + clearTimeout(reconnectTimerRef.current); + reconnectTimerRef.current = null; + } wsRef.current?.close(); wsRef.current = null; }, []);As per coding guidelines, "Manage WebSocket connection lifecycle in
useWebSockethook with reconnection logic".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@web/hooks/useWebSocket.ts` around lines 29 - 37, The WebSocket currently sets ws.onclose and ws.onerror to only update state, so add automatic reconnection logic inside the connect function: when ws.onclose or ws.onerror fires, set state appropriately (e.g., 'disconnected' or 'error') and schedule a reconnect attempt using exponential backoff (or fixed retry interval) while preserving the current url and hook lifecycle; ensure you clear pending timers when unmounting and avoid multiple concurrent connections by cancelling previous reconnect attempts before creating a new WebSocket. Reference the connect function, ws.onclose, ws.onerror, setState, and url to locate where to implement the reconnection/backoff and cleanup logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/live/proxy.go`:
- Around line 289-308: Avoid duplicating and accumulating partial transcripts by
only forwarding and adding to tool context once the transcript is final: change
the logic around AddTranscript and sendJSON for
content.InputTranscription/part.Text and content.OutputTranscription so that
partial (unfinished) segments are not passed to p.toolHandler.AddTranscript, and
ensure you do not emit the same model utterance twice when both part.Text and
content.OutputTranscription.Text are present (prefer the final
OutputTranscription when Finished==true or dedupe by skipping part.Text if
OutputTranscription exists). In short, gate AddTranscript and sendJSON on the
Finished flag and add a check so model transcripts from part.Text are suppressed
when content.OutputTranscription is present to prevent duplicate render/context
pollution.
---
Outside diff comments:
In `@web/hooks/useWebSocket.ts`:
- Around line 29-37: The WebSocket currently sets ws.onclose and ws.onerror to
only update state, so add automatic reconnection logic inside the connect
function: when ws.onclose or ws.onerror fires, set state appropriately (e.g.,
'disconnected' or 'error') and schedule a reconnect attempt using exponential
backoff (or fixed retry interval) while preserving the current url and hook
lifecycle; ensure you clear pending timers when unmounting and avoid multiple
concurrent connections by cancelling previous reconnect attempts before creating
a new WebSocket. Reference the connect function, ws.onclose, ws.onerror,
setState, and url to locate where to implement the reconnection/backoff and
cleanup logic.
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
internal/live/proxy.gointernal/session/manager.gointernal/session/manager_test.goweb/app/page.tsxweb/components/ActionsHUD.tsxweb/components/ChatPanel.tsxweb/components/StatusHUD.tsxweb/hooks/useWebSocket.ts
| // Forward input transcription (what the user said). | ||
| if content.InputTranscription != nil && content.InputTranscription.Text != "" { | ||
| p.toolHandler.AddTranscript("user", content.InputTranscription.Text) | ||
| p.sendJSON(map[string]any{ | ||
| "type": "transcript", | ||
| "role": "user", | ||
| "text": content.InputTranscription.Text, | ||
| "finished": content.InputTranscription.Finished, | ||
| }) | ||
| } | ||
|
|
||
| // Forward output transcription (what the model said, as text). | ||
| if content.OutputTranscription != nil && content.OutputTranscription.Text != "" { | ||
| p.sendJSON(map[string]any{ | ||
| "type": "transcript", | ||
| "role": "model", | ||
| "text": content.OutputTranscription.Text, | ||
| "finished": content.OutputTranscription.Finished, | ||
| }) | ||
| } |
There was a problem hiding this comment.
전사 이벤트를 이중/부분 누적으로 보내서 채팅 중복과 컨텍스트 오염이 생길 수 있습니다.
Line 276 경로(part.Text)와 Line 301 경로(OutputTranscription.Text)가 동시에 model transcript를 내보내면 동일 발화가 중복 렌더링됩니다. 또한 Line 291은 finished 이전 partial도 AddTranscript에 넣어 tool 문맥이 불필요하게 부풀 수 있습니다.
🧩 제안 패치
- // Forward input transcription (what the user said).
+ // Forward input transcription (what the user said).
if content.InputTranscription != nil && content.InputTranscription.Text != "" {
- p.toolHandler.AddTranscript("user", content.InputTranscription.Text)
+ if content.InputTranscription.Finished {
+ p.toolHandler.AddTranscript("user", content.InputTranscription.Text)
+ }
p.sendJSON(map[string]any{
"type": "transcript",
"role": "user",
"text": content.InputTranscription.Text,
"finished": content.InputTranscription.Finished,
})
}
- // Forward output transcription (what the model said, as text).
- if content.OutputTranscription != nil && content.OutputTranscription.Text != "" {
+ // Forward output transcription (what the model said, as text).
+ // Prefer a single model transcript source to avoid duplicates with ModelTurn.Part.Text.
+ if content.OutputTranscription != nil && content.OutputTranscription.Text != "" {
+ p.toolHandler.AddTranscript("model", content.OutputTranscription.Text)
p.sendJSON(map[string]any{
"type": "transcript",
"role": "model",
"text": content.OutputTranscription.Text,
"finished": content.OutputTranscription.Finished,
})
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Forward input transcription (what the user said). | |
| if content.InputTranscription != nil && content.InputTranscription.Text != "" { | |
| p.toolHandler.AddTranscript("user", content.InputTranscription.Text) | |
| p.sendJSON(map[string]any{ | |
| "type": "transcript", | |
| "role": "user", | |
| "text": content.InputTranscription.Text, | |
| "finished": content.InputTranscription.Finished, | |
| }) | |
| } | |
| // Forward output transcription (what the model said, as text). | |
| if content.OutputTranscription != nil && content.OutputTranscription.Text != "" { | |
| p.sendJSON(map[string]any{ | |
| "type": "transcript", | |
| "role": "model", | |
| "text": content.OutputTranscription.Text, | |
| "finished": content.OutputTranscription.Finished, | |
| }) | |
| } | |
| // Forward input transcription (what the user said). | |
| if content.InputTranscription != nil && content.InputTranscription.Text != "" { | |
| if content.InputTranscription.Finished { | |
| p.toolHandler.AddTranscript("user", content.InputTranscription.Text) | |
| } | |
| p.sendJSON(map[string]any{ | |
| "type": "transcript", | |
| "role": "user", | |
| "text": content.InputTranscription.Text, | |
| "finished": content.InputTranscription.Finished, | |
| }) | |
| } | |
| // Forward output transcription (what the model said, as text). | |
| // Prefer a single model transcript source to avoid duplicates with ModelTurn.Part.Text. | |
| if content.OutputTranscription != nil && content.OutputTranscription.Text != "" { | |
| p.toolHandler.AddTranscript("model", content.OutputTranscription.Text) | |
| p.sendJSON(map[string]any{ | |
| "type": "transcript", | |
| "role": "model", | |
| "text": content.OutputTranscription.Text, | |
| "finished": content.OutputTranscription.Finished, | |
| }) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/live/proxy.go` around lines 289 - 308, Avoid duplicating and
accumulating partial transcripts by only forwarding and adding to tool context
once the transcript is final: change the logic around AddTranscript and sendJSON
for content.InputTranscription/part.Text and content.OutputTranscription so that
partial (unfinished) segments are not passed to p.toolHandler.AddTranscript, and
ensure you do not emit the same model utterance twice when both part.Text and
content.OutputTranscription.Text are present (prefer the final
OutputTranscription when Finished==true or dedupe by skipping part.Text if
OutputTranscription exists). In short, gate AddTranscript and sendJSON on the
Finished flag and add a check so model transcripts from part.Text are suppressed
when content.OutputTranscription is present to prevent duplicate render/context
pollution.
…Transcript
- Remove model text forwarding from part.Text (use OutputTranscription only)
to prevent duplicate chat messages in browser
- Only call AddTranscript("user") when InputTranscription.Finished is true
to avoid bloating tool context with partial speech chunks
- Handle empty finalized transcript by cleaning up placeholder messages
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 840b50d255
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| } | ||
|
|
||
| // Forward input transcription (what the user said). | ||
| if content.InputTranscription != nil && content.InputTranscription.Text != "" { |
There was a problem hiding this comment.
Forward empty finished transcription chunks to clients
handleServerContent drops transcription updates whenever Text is empty, so a Finished=true terminal chunk with no text is never forwarded. In that case the frontend never receives the finalize signal it needs to clear/commit the pending bubble (it already has explicit empty-finalize handling), and the backend also skips AddTranscript("user", ...) because that is only executed on finished events inside this same non-empty guard. This can leave stale/concatenated chat turns and lose user utterances from tool context for analyze_user.
Useful? React with 👍 / 👎.
Summary
Local CI
Test plan
🤖 Generated with Claude Code
Summary by CodeRabbit
릴리스 노트