feat: add microphone input, strip markdown, fix system prompt#135
Conversation
…m prompt - Add useMicrophone hook: captures mic audio via getUserMedia + ScriptProcessor, converts Float32 → Int16 PCM at 16kHz, sends via WebSocket binary - Add stripMarkdown utility: removes markdown formatting (bold, italic, headers, code, links) from model transcript for clean voice-app display - Improve onboarding system prompt: prevent model from exposing internal instructions, remove markdown usage, enforce natural conversational style - Change greeting trigger from meta-instruction to simple "Hello!" to avoid the model narrating its own behavior Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary of ChangesHello @ComBba, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the user experience by introducing voice interaction capabilities and refining the AI's conversational output. It integrates microphone input for real-time audio communication, ensures AI responses are presented cleanly by stripping markdown, and improves the AI's initial greeting and adherence to conversational guidelines. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
Walkthrough마이크 오디오 캡처 기능을 추가하고 마크다운 형식 제거 유틸리티를 도입했으며, 백엔드의 인사말과 온보딩 시스템 지시사항 텍스트를 업데이트했습니다. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces microphone input for voice interaction, a utility to strip markdown from AI responses, and an improved system prompt for a more robust and natural AI. While these features enhance the AI's functionality, a high-severity security issue was identified regarding insecure prompt construction. The application directly embeds potentially untrusted data into system instructions and conversation summaries, creating a significant risk of prompt injection that could allow an attacker to manipulate the AI's behavior or gain unauthorized access. Additionally, the microphone implementation uses a deprecated API, the markdown stripping has a bug with underscores, and WebSocket connection handling could be more reliable. Addressing these security vulnerabilities and functional improvements is critical for stability and safety.
| // Start microphone after a short delay to ensure WebSocket is connected. | ||
| setTimeout(() => { | ||
| mic.start((pcm) => { | ||
| send({ type: 'audio', data: pcm }); | ||
| }); | ||
| }, 500); |
There was a problem hiding this comment.
Using setTimeout with a fixed delay to wait for the WebSocket connection is fragile and can lead to race conditions on slower networks. The microphone might start sending data before the connection is established, or the delay might be unnecessarily long.
A more robust approach is to use a useEffect hook to react to the WebSocket connection state. This ensures the microphone is started exactly when the connection becomes available.
Please remove this setTimeout and add the following useEffect to the component:
useEffect(() => {
if (started && state === 'connected') {
mic.start((pcm) => {
send({ type: 'audio', data: pcm });
});
}
}, [started, state, mic, send]);| ctxRef.current = ctx; | ||
|
|
||
| const source = ctx.createMediaStreamSource(stream); | ||
| const processor = ctx.createScriptProcessor(BUFFER_SIZE, 1, 1); |
There was a problem hiding this comment.
createScriptProcessor is deprecated and can cause audio glitches because it runs on the main thread. For robust audio processing, you should migrate to AudioWorklet, which runs in a separate thread, preventing UI freezes or audio dropouts.
You would need to create a worklet file (e.g., audio-processor.js) and then use audioContext.audioWorklet.addModule() and new AudioWorkletNode().
Here's an example of what the worklet and the updated hook would look like:
public/audio-processor.js (this file would need to be created)
class AudioProcessor extends AudioWorkletProcessor {
process(inputs, outputs, parameters) {
const pcm = inputs[0][0];
if (!pcm) return true;
// Convert Float32 [-1, 1] to Int16 [-32768, 32767]
const int16 = new Int16Array(pcm.length);
for (let i = 0; i < pcm.length; i++) {
const s = Math.max(-1, Math.min(1, pcm[i]));
int16[i] = s < 0 ? s * 0x8000 : s * 0x7fff;
}
this.port.postMessage(int16.buffer, [int16.buffer]);
return true;
}
}
registerProcessor('audio-processor', AudioProcessor);Updated useMicrophone.ts start function:
const start = useCallback(async (onData: (pcm: ArrayBuffer) => void) => {
if (streamRef.current) return;
onDataRef.current = onData;
const stream = await navigator.mediaDevices.getUserMedia({ audio: { echoCancellation: true, noiseSuppression: true, sampleRate: TARGET_SAMPLE_RATE } });
streamRef.current = stream;
const ctx = new AudioContext({ sampleRate: TARGET_SAMPLE_RATE });
ctxRef.current = ctx;
await ctx.audioWorklet.addModule('/audio-processor.js');
const workletNode = new AudioWorkletNode(ctx, 'audio-processor');
processorRef.current = workletNode; // Store for cleanup
workletNode.port.onmessage = (event) => {
onDataRef.current?.(event.data);
};
const source = ctx.createMediaStreamSource(stream);
source.connect(workletNode);
workletNode.connect(ctx.destination);
setIsRecording(true);
}, []);Note that processorRef would need to be updated to useRef<AudioWorkletNode | null>(null) and the stop function would need to be adjusted to handle the worklet node.
| // Remove bold+italic (***text*** or ___text___) | ||
| .replace(/(\*{3}|_{3})(.*?)\1/g, '$2') | ||
| // Remove bold (**text** or __text__) | ||
| .replace(/(\*{2}|_{2})(.*?)\1/g, '$2') | ||
| // Remove italic (*text* or _text_) | ||
| .replace(/(\*|_)(.*?)\1/g, '$2') |
There was a problem hiding this comment.
The regular expressions for stripping bold and italic formatting with underscores are too aggressive and can corrupt text that is not markdown. For example, a string like file_name_with_underscores would be incorrectly converted to filenamewith_underscores.
To fix this, the regex for underscores should ensure they are not part of a word (i.e., they are surrounded by whitespace or punctuation, or are at the start/end of the string). This can be achieved using negative lookarounds (?<!\w) and (?!\w).
| // Remove bold+italic (***text*** or ___text___) | |
| .replace(/(\*{3}|_{3})(.*?)\1/g, '$2') | |
| // Remove bold (**text** or __text__) | |
| .replace(/(\*{2}|_{2})(.*?)\1/g, '$2') | |
| // Remove italic (*text* or _text_) | |
| .replace(/(\*|_)(.*?)\1/g, '$2') | |
| // Remove bold+italic (***text*** or ___text___), handling underscores carefully | |
| .replace(/\*{3}(.*?)\*{3}|(?<!\w)___(.*?)___(?!\w)/g, '$1$2') | |
| // Remove bold (**text** or __text__) | |
| .replace(/\*{2}(.*?)\*{2}|(?<!\w)__(.*?)__(?!\w)/g, '$1$2') | |
| // Remove italic (*text* or _text_) | |
| .replace(/\*(.*?)\*|(?<!\w)_(.*?)_(?!\w)/g, '$1$2') |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8b053d6432
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| setTimeout(() => { | ||
| mic.start((pcm) => { | ||
| send({ type: 'audio', data: pcm }); | ||
| }); | ||
| }, 500); |
There was a problem hiding this comment.
Cancel delayed microphone start when stopping session
The delayed setTimeout microphone start is never tracked or cleared, so if the user ends the session within the first 500ms, the callback still runs and calls mic.start(...) after disconnect. In that flow the app can re-open microphone capture after the user explicitly pressed “End Session,” which is a user-facing privacy/resource bug and can leave recording active against a stopped session.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
web/app/page.tsx (1)
119-124:⚠️ Potential issue | 🟡 Minor컴포넌트 언마운트 시 마이크 정리 누락
세션 활성 상태에서 컴포넌트가 언마운트되면 마이크가 계속 녹음 상태로 남을 수 있습니다.
🧹 useEffect cleanup에 mic.stop 추가
// Sync body scroll-lock class with session state; cleanup on unmount. useEffect(() => { document.body.classList.toggle('session-active', started); return () => { document.body.classList.remove('session-active'); + mic.stop(); }; - }, [started]); + }, [started, mic]);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@web/app/page.tsx` around lines 119 - 124, useEffect currently toggles document.body class on start but its cleanup only removes the class; ensure any active microphone is stopped on unmount by calling mic.stop() (or the project's microphone stop function) inside the cleanup when started is true; update the useEffect cleanup to call mic.stop() (or stopMicrophone()) before removing 'session-active' so the microphone doesn't remain recording; reference useEffect, started, and mic.stop to locate and implement the change.
🧹 Nitpick comments (1)
web/__tests__/lib/stripMarkdown.test.ts (1)
25-27: 이미지 마크다운 제거 테스트 케이스 누락이미지 마크다운(
)에 대한 테스트가 없습니다.stripMarkdown.ts의 링크/이미지 순서 버그를 발견하는 데 도움이 됩니다.🧪 이미지 테스트 케이스 추가
it('strips links preserving text', () => { expect(stripMarkdown('[click here](https://example.com)')).toBe('click here'); }); + it('strips images preserving alt text', () => { + expect(stripMarkdown('')).toBe('logo'); + }); + it('strips bullet list markers', () => {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@web/__tests__/lib/stripMarkdown.test.ts` around lines 25 - 27, Add a unit test to web/__tests__/lib/stripMarkdown.test.ts that verifies stripMarkdown correctly handles image markdown; specifically, add an it block (e.g., "strips image markdown preserving alt text") that calls stripMarkdown('') and expects the result toBe('alt'), referencing the stripMarkdown function to catch the link/image ordering bug.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@web/app/page.tsx`:
- Around line 130-136: The fixed 500ms setTimeout can start the mic before the
WebSocket is ready; instead, wait for the WebSocket open/connected event or a
connection promise before calling mic.start, and only call send({ type: 'audio',
data: pcm }) after confirming the socket is open; on connection failure or close
do not start or immediately stop the mic. Locate the mic.start(...) call and the
send(...) usage and replace the timeout-based start with logic that listens for
the WebSocket instance's 'open' (or a connect promise) and starts mic.start
inside that handler, and ensure cleanup/stop on socket error/close.
In `@web/hooks/useMicrophone.ts`:
- Around line 21-28: Wrap the navigator.mediaDevices.getUserMedia call in a
try-catch inside useMicrophone (the hook containing streamRef) so rejections
(permission denied, no device) are handled; on success set streamRef.current =
stream as before, on failure set/return an explicit error state (e.g., a
returned error value or hook state like microphoneError) and avoid leaving
streamRef undefined, and ensure the hook's public API surfaces that error so
callers (like page.tsx) can display feedback instead of encountering an
unhandled rejection.
In `@web/lib/stripMarkdown.ts`:
- Around line 19-22: The current .replace chain in stripMarkdown leaves a stray
"!" because the link regex (.replace(/\[([^\]]+)\]\([^)]+\)/g, '$1')) runs
before the image regex; change the order so the image removal regex
(.replace(/!\[([^\]]*)\]\([^)]+\)/g, '$1')) executes first, then the link regex,
ensuring `` is fully handled; update the .replace sequence inside the
stripMarkdown function accordingly.
---
Outside diff comments:
In `@web/app/page.tsx`:
- Around line 119-124: useEffect currently toggles document.body class on start
but its cleanup only removes the class; ensure any active microphone is stopped
on unmount by calling mic.stop() (or the project's microphone stop function)
inside the cleanup when started is true; update the useEffect cleanup to call
mic.stop() (or stopMicrophone()) before removing 'session-active' so the
microphone doesn't remain recording; reference useEffect, started, and mic.stop
to locate and implement the change.
---
Nitpick comments:
In `@web/__tests__/lib/stripMarkdown.test.ts`:
- Around line 25-27: Add a unit test to web/__tests__/lib/stripMarkdown.test.ts
that verifies stripMarkdown correctly handles image markdown; specifically, add
an it block (e.g., "strips image markdown preserving alt text") that calls
stripMarkdown('') and expects the result toBe('alt'),
referencing the stripMarkdown function to catch the link/image ordering bug.
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
internal/handler/websocket.gointernal/session/manager.gointernal/session/manager_test.goweb/__tests__/hooks/useMicrophone.test.tsweb/__tests__/lib/stripMarkdown.test.tsweb/app/page.tsxweb/hooks/useMicrophone.tsweb/lib/stripMarkdown.ts
| // Start microphone after a short delay to ensure WebSocket is connected. | ||
| setTimeout(() => { | ||
| mic.start((pcm) => { | ||
| send({ type: 'audio', data: pcm }); | ||
| }); | ||
| }, 500); | ||
| }; |
There was a problem hiding this comment.
WebSocket 연결 전 마이크 시작 경합 조건
500ms 고정 지연은 WebSocket이 연결되었음을 보장하지 않습니다. 네트워크 상태에 따라 연결이 더 오래 걸릴 수 있고, 연결 실패 시에도 마이크가 시작됩니다.
🔧 WebSocket 상태 기반 마이크 시작으로 개선
+ // Start mic when WebSocket connects
+ useEffect(() => {
+ if (started && state === 'connected' && !mic.isRecording) {
+ mic.start((pcm) => {
+ send({ type: 'audio', data: pcm });
+ });
+ }
+ }, [started, state, mic, send]);
+
const handleStart = () => {
initAudioContext();
connect();
setStarted(true);
- // Start microphone after a short delay to ensure WebSocket is connected.
- setTimeout(() => {
- mic.start((pcm) => {
- send({ type: 'audio', data: pcm });
- });
- }, 500);
};🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@web/app/page.tsx` around lines 130 - 136, The fixed 500ms setTimeout can
start the mic before the WebSocket is ready; instead, wait for the WebSocket
open/connected event or a connection promise before calling mic.start, and only
call send({ type: 'audio', data: pcm }) after confirming the socket is open; on
connection failure or close do not start or immediately stop the mic. Locate the
mic.start(...) call and the send(...) usage and replace the timeout-based start
with logic that listens for the WebSocket instance's 'open' (or a connect
promise) and starts mic.start inside that handler, and ensure cleanup/stop on
socket error/close.
| const stream = await navigator.mediaDevices.getUserMedia({ | ||
| audio: { | ||
| echoCancellation: true, | ||
| noiseSuppression: true, | ||
| sampleRate: TARGET_SAMPLE_RATE, | ||
| }, | ||
| }); | ||
| streamRef.current = stream; |
There was a problem hiding this comment.
getUserMedia 오류 처리 부재
getUserMedia가 거부되면(권한 거부, 마이크 없음 등) 프라미스가 reject되어 호출자에게 처리되지 않은 예외가 전파됩니다. 호출부(page.tsx)에서도 catch하지 않으므로 사용자에게 적절한 피드백을 제공하지 못합니다.
🛡️ try-catch 또는 에러 상태 추가 권장
+ const [error, setError] = useState<string | null>(null);
+
const start = useCallback(async (onData: (pcm: ArrayBuffer) => void) => {
if (streamRef.current) return; // already recording
onDataRef.current = onData;
- const stream = await navigator.mediaDevices.getUserMedia({
- audio: {
- echoCancellation: true,
- noiseSuppression: true,
- sampleRate: TARGET_SAMPLE_RATE,
- },
- });
+ let stream: MediaStream;
+ try {
+ stream = await navigator.mediaDevices.getUserMedia({
+ audio: {
+ echoCancellation: true,
+ noiseSuppression: true,
+ sampleRate: TARGET_SAMPLE_RATE,
+ },
+ });
+ } catch (err) {
+ setError(err instanceof Error ? err.message : 'Microphone access denied');
+ return;
+ }
+ setError(null);
streamRef.current = stream;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@web/hooks/useMicrophone.ts` around lines 21 - 28, Wrap the
navigator.mediaDevices.getUserMedia call in a try-catch inside useMicrophone
(the hook containing streamRef) so rejections (permission denied, no device) are
handled; on success set streamRef.current = stream as before, on failure
set/return an explicit error state (e.g., a returned error value or hook state
like microphoneError) and avoid leaving streamRef undefined, and ensure the
hook's public API surfaces that error so callers (like page.tsx) can display
feedback instead of encountering an unhandled rejection.
| // Remove links [text](url) → text | ||
| .replace(/\[([^\]]+)\]\([^)]+\)/g, '$1') | ||
| // Remove images  | ||
| .replace(/!\[([^\]]*)\]\([^)]+\)/g, '$1') |
There was a problem hiding this comment.
이미지와 링크 제거 순서 문제
현재 순서에서 링크 regex가 이미지보다 먼저 실행되므로, 에서 [alt](url) 부분만 매칭되어 !가 남게 됩니다.
🐛 이미지 regex를 링크 regex보다 먼저 배치
// Remove headers (# ... ##)
.replace(/^#{1,6}\s+/gm, '')
- // Remove links [text](url) → text
- .replace(/\[([^\]]+)\]\([^)]+\)/g, '$1')
// Remove images 
.replace(/!\[([^\]]*)\]\([^)]+\)/g, '$1')
+ // Remove links [text](url) → text
+ .replace(/\[([^\]]+)\]\([^)]+\)/g, '$1')
// Remove bullet list markers📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Remove links [text](url) → text | |
| .replace(/\[([^\]]+)\]\([^)]+\)/g, '$1') | |
| // Remove images  | |
| .replace(/!\[([^\]]*)\]\([^)]+\)/g, '$1') | |
| // Remove headers (# ... ##) | |
| .replace(/^#{1,6}\s+/gm, '') | |
| // Remove images  | |
| .replace(/!\[([^\]]*)\]\([^)]+\)/g, '$1') | |
| // Remove links [text](url) → text | |
| .replace(/\[([^\]]+)\]\([^)]+\)/g, '$1') | |
| // Remove bullet list markers |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@web/lib/stripMarkdown.ts` around lines 19 - 22, The current .replace chain in
stripMarkdown leaves a stray "!" because the link regex
(.replace(/\[([^\]]+)\]\([^)]+\)/g, '$1')) runs before the image regex; change
the order so the image removal regex (.replace(/!\[([^\]]*)\]\([^)]+\)/g, '$1'))
executes first, then the link regex, ensuring `` is fully handled;
update the .replace sequence inside the stripMarkdown function accordingly.
Summary
useMicrophonehook that captures browser mic audio (16kHz PCM Int16) and sends it via WebSocket binary to the Go backend → Gemini Live API. Users can now have real voice conversations.stripMarkdownutility to clean markdown formatting (**bold**,## headers, etc.) from model transcript before display. Voice apps should show clean text, not raw markdown."Hello!".Changes
web/hooks/useMicrophone.tsweb/lib/stripMarkdown.tsweb/app/page.tsxinternal/session/manager.gointernal/handler/websocket.goLocal CI
Test plan
🤖 Generated with Claude Code
Summary by CodeRabbit
릴리스 노트
새로운 기능
테스트