feat: merge PRs #2, #4, and #6 with completed iOS support#7
Conversation
The Android SpeechRecognizer.stopListening() method doesn't reliably
stop audio input in all scenarios. This is problematic for Push-to-Talk
(PTT) implementations where the user expects recording to stop
immediately when releasing the button.
This commit adds:
1. forceStop() method with destroy/recreate pattern:
- First tries graceful stopListening()
- After configurable timeout (default 1500ms), forces stop by
destroying and recreating the SpeechRecognizer
- Returns the last cached partial result so no speech is lost
- Emits 'stopped' state after force stop completes
2. getLastPartialResult() method:
- Returns the most recent partial transcription
- Useful for checking state or retrieving results after force stop
3. Callback guards:
- onError, onResults, onPartialResults now check forceStopped flag
- Prevents late callbacks from interfering after force stop
- onResults cancels pending force-stop timeout when results arrive
Usage example:
```javascript
// Start with partial results for real-time updates
await SpeechRecognition.start({ partialResults: true });
// On button release, force stop with 1s timeout
await SpeechRecognition.forceStop({ timeout: 1000 });
// Get what was heard (in case normal results didn't fire)
const result = await SpeechRecognition.getLastPartialResult();
```
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…tal)
EXPERIMENTAL: This feature allows recognition to continue through silence
periods while the PTT button is held, useful for natural speech with pauses.
This commit adds:
1. continuousPTT option in start():
- When enabled, recognition auto-restarts on silence/timeout errors
- Results are accumulated across restarts
- Requires setPTTState() to track button state
2. setPTTState() method:
- Call with held=true on button press
- Call with held=false on button release
- Resets accumulated results when button is pressed
3. Auto-restart logic:
- onError: Restarts on ERROR_NO_MATCH or ERROR_SPEECH_TIMEOUT
while button is held
- onResults: Accumulates result and restarts while button is held
- Emits 'isRestarting' flag in partial results when restarting
4. Accumulated results:
- Results from multiple recognition sessions are concatenated
- Final result includes 'accumulatedText' with full transcription
Usage example:
```javascript
// Start with continuous PTT enabled
await SpeechRecognition.start({
partialResults: true,
continuousPTT: true
});
// On button press
await SpeechRecognition.setPTTState({ held: true });
// ... user speaks with pauses, recognition continues ...
// On button release
await SpeechRecognition.setPTTState({ held: false });
await SpeechRecognition.forceStop();
// Final result includes all accumulated speech
```
WARNING: This is experimental and may cause issues in some scenarios.
Use forceStop() without continuousPTT for reliable basic PTT.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Text nodes don't have classList property, causing TypeError when firstChild is whitespace. Add optional chaining to prevent the error. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add interactive PTT demonstration to the example app: - Toggle switch to enable PTT mode (hides regular Start/Stop buttons) - Checkbox to enable continuousPTT (continue on silence, experimental) - Press-and-hold microphone button with visual feedback - Display accumulated results from continuous PTT sessions - Add web.ts stubs for forceStop, getLastPartialResult, setPTTState - Fix Kotlin version conflict in example-app Android build Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When continuousPTT mode auto-restarts after silence, beginListening is called with null PluginCall. Add null checks to prevent NPE crash. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When user releases PTT (scheduling the force-stop timeout) and quickly starts a new session, the old timeout could still fire and destroy the fresh recognizer. This fix cancels any pending forceStopRunnable before starting a new recognition session. Addresses CodeRabbit review feedback. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add kotlin-gradle-plugin:1.9.24 to buildscript dependencies to match the forced stdlib versions in resolutionStrategy. This ensures KGP version consistency and avoids AGP/KGP version mismatches. Addresses CodeRabbit review feedback. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Update JSDoc to accurately describe that forceStop() returns Promise<void> and emits partial results via event. Users should call getLastPartialResult() to retrieve transcription after force stop. Addresses CodeRabbit review feedback. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When partialResults=false, start() returns a Promise that awaits resolution from onResults()/onError(). If forceStop() timeout triggers before normal completion, forceStopped=true caused both callbacks to return early without resolving the pending PluginCall. Now tracks activeStartCall and rejects it with "forceStop" error in the timeout handler, preventing callers from hanging indefinitely. Addresses CodeRabbit review feedback. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The TypeScript interface only defined `matches`, but the Android implementation emits additional fields in PTT scenarios: - accumulated: transcription across continuous PTT restarts - accumulatedText: final accumulated text including current result - isRestarting: true when restarting in continuous PTT mode - forced: true when result emitted due to force stop timeout Addresses CodeRabbit review feedback. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Mark forceStop(), getLastPartialResult(), and setPTTState() as Android-only APIs since there are no corresponding iOS implementations. This helps TypeScript users and documentation generators clearly indicate these methods are unavailable on iOS. Addresses CodeRabbit review feedback. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…nition - Introduced a finite state machine to manage listening sessions, improving predictability and observability. - Emitted events for state transitions: `startingListening`, `started`, `stoppingListening`, and `stopped`. - Enhanced error handling by emitting an `error` event for all recognition errors. - Added `readyForNextSession` event to signal when the recognizer is ready for a new session. - Maintained backward compatibility while improving session management and UI responsiveness.
# Conflicts: # android/src/main/java/app/capgo/speechrecognition/SpeechRecognitionPlugin.java
# Conflicts: # README.md # android/src/main/java/app/capgo/speechrecognition/SpeechRecognitionPlugin.java
# Conflicts: # android/src/main/java/app/capgo/speechrecognition/SpeechRecognitionPlugin.java # src/definitions.ts
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
📝 WalkthroughWalkthroughAdded push-to-talk session management: three new public plugin APIs ( Changes
Sequence Diagram(s)sequenceDiagram
participant UI as Example App UI
participant JS as Plugin JS
participant Native as Native Plugin (Android/iOS)
participant Recognizer as Native Recognizer
rect rgba(200,230,255,0.5)
UI->>JS: handlePTTPress()
JS->>Native: setPTTState(held=true)
JS->>Native: start(options + continuousPTT?)
end
rect rgba(200,255,200,0.5)
Native->>Recognizer: create/start session(sessionId)
Recognizer-->>Native: partial results (matches)
Native-->>JS: partialResults (matches, accumulated?, sessionId)
JS-->>UI: update accumulated display
end
rect rgba(255,230,200,0.5)
UI->>JS: handlePTTRelease()
JS->>Native: setPTTState(held=false)
JS->>Native: forceStop(timeout)
Native->>Recognizer: stop/teardown
Native-->>JS: optional forced partial (forced=true)
Native-->>JS: readyForNextSession(sessionId)
JS-->>UI: fetch getLastPartialResult() / finalize UI
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 10
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
android/src/main/java/app/capgo/speechrecognition/SpeechRecognitionPlugin.java (2)
1110-1122:⚠️ Potential issue | 🟠 MajorApply the stale-listener guard to segmented callbacks too.
Late
onSegmentResults()/onEndOfSegmentedSession()callbacks are still emitted after a session or recognizer-generation switch because only the other listener methods callisStale(). That leaks segment events from an old recognizer into the next session.Suggested fix
`@Override` public void onSegmentResults(Bundle results) { + if (isStale()) { + return; + } ArrayList<String> matches = results.getStringArrayList(SpeechRecognizer.RESULTS_RECOGNITION); if (matches == null) { return; } notifyListeners(SEGMENT_RESULTS_EVENT, new JSObject().put("matches", new JSArray(matches))); } `@Override` public void onEndOfSegmentedSession() { + if (isStale()) { + return; + } notifyListeners(END_OF_SEGMENT_EVENT, new JSObject()); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@android/src/main/java/app/capgo/speechrecognition/SpeechRecognitionPlugin.java` around lines 1110 - 1122, The segmented-callback handlers onSegmentResults and onEndOfSegmentedSession need the same stale-listener guard used elsewhere: at the start of onSegmentResults(Bundle results) call isStale() and return early if true before processing matches/notifyListeners(SEGMENT_RESULTS_EVENT), and likewise in onEndOfSegmentedSession() call isStale() and return early before notifyListeners(END_OF_SEGMENT_EVENT); this prevents old recognizer/session callbacks from leaking into a new session.
576-603:⚠️ Potential issue | 🔴 CriticalSession-check the on-device support/model-download callbacks.
These callbacks call
startInlineListening()/call.reject()without verifying thatcurrentSessionIdis still live. If the user stops or restarts while support checking or model download is in flight, a stale callback can revive a dead session or reject a newer flow.Suggested guard pattern
`@Override` public void onSupportResult(RecognitionSupport support) { + try { + lock.lock(); + if (currentSessionId != sessionId || pendingStopReason != null || state == ListeningState.IDLE) { + return; + } + } finally { + lock.unlock(); + } boolean installed = isLanguageSupported(language, support.getInstalledOnDeviceLanguages()); boolean supported = installed ||Apply the same guard before acting in
onError(...),onSuccess(),onScheduled(), and the model-downloadonError(...)path.Also applies to: 605-614, 619-656
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@android/src/main/java/app/capgo/speechrecognition/SpeechRecognitionPlugin.java` around lines 576 - 603, Guard all on-device support and model-download callbacks by verifying the session is still active for the same currentSessionId before taking actions; specifically, at the start of RecognitionSupportCallback.onSupportResult and in the model-download callbacks (onError, onSuccess, onScheduled) check that the sessionId matches an active session (the same session tracking used by finishSession/currentSessionId) and return early if it is stale, so that calls to startInlineListening(intent,...), triggerOnDeviceModelDownload(...), call.reject(...), emitErrorEvent(...), and finishSession(...) only run for live sessions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@android/src/main/java/app/capgo/speechrecognition/SpeechRecognitionPlugin.java`:
- Around line 241-279: stop()/forceStop() prematurely mark popup-mode sessions
finished even when an external activity started by startActivityForResult(...)
is still outstanding; update stop() and forceStop() to detect popup/modal
sessions started via startActivityForResult (check the same flag/field used when
launching the activity), cancel any scheduled finish fallbacks (call
cancelPendingForceStopLocked()), and instead defer emitting terminal events and
calling scheduleFinishFallbackLocked until the activity result returns;
additionally, when stopping you should attempt to finish/cancel the launched
activity (call Activity.finish() or use the saved PendingIntent/result handler)
so the popup doesn't outlive the session, and ensure listeningResult(...) only
resolves/rejects after the activity result is processed and
emitListeningState(..., "finished", ...) is only emitted after that.
- Around line 186-205: Before mutating session state in start(), check the
current state under the same lock and reject overlapping starts: if state is
ListeningState.STARTING, STARTED, or STOPPING, immediately fail/return the new
start call (e.g., complete its promise with an error) instead of proceeding;
only when state is not one of those values continue to increment sessionId, set
currentSessionId, assign activeStartCall, and set state =
ListeningState.STARTING. Ensure this check uses the same lock around the block
that touches sessionId, activeStartCall, and state so beginListening() and
subsequent code (e.g., beginListening()) will not have its promise orphaned by
an overwritten session.
In `@example-app/src/main.js`:
- Around line 96-104: In togglePTTMode, only enable the continuousPTT checkbox
on Android: detect platform (use the existing platform helper or add an
isAndroid check) and change the logic so refs.continuousPTT.disabled becomes
!(enabled && isAndroid) and refs.continuousPTT.checked is forced false when not
(enabled && isAndroid); update the control visibility/behavior in togglePTTMode
to honor this Android-only requirement while leaving other UI toggles unchanged.
- Around line 233-250: The PTT control only has mouse/touch handlers and lacks
keyboard support, so update the refs.pttButton wiring to also handle key events
and blur: add a keydown listener that calls handlePTTPress() when Space or Enter
is pressed (preventDefault to avoid scrolling on Space), a keyup listener that
calls handlePTTRelease() for those keys, and a blur listener that calls
handlePTTRelease() as a safety release; ensure existing touch/mouse handlers
(handlePTTPress, handlePTTRelease) are reused and keep
refs.pttMode/togglePTTMode unchanged.
- Around line 184-223: handlePTTPress and handlePTTRelease can interleave and
race; serialize them by adding a transition lock/token (e.g., pttTransitionId or
an async mutex) that each press/release captures and validates before proceeding
so stale async work is ignored. On entry to handlePTTPress and handlePTTRelease
acquire the token (or increment a session id), store it locally, and before any
async continuation (after await SpeechRecognition.setPTTState, await
SpeechRecognition.start, await SpeechRecognition.forceStop, await
SpeechRecognition.getLastPartialResult, or any other await) check the token
still matches; if not, abort that stale branch. Also ensure handlePTTPress does
not start a new session while a previous transition is in progress and only
allow a new press once forceStop or a readyForNextSession signal completes. Use
the existing symbols: handlePTTPress, handlePTTRelease,
SpeechRecognition.setPTTState, SpeechRecognition.start,
SpeechRecognition.forceStop, and SpeechRecognition.getLastPartialResult to
locate and instrument the logic.
In `@IMPLEMENTATION_SUMMARY.md`:
- Around line 464-489: The implementation summary still lists pending
README/CHANGELOG tasks and a sample CHANGELOG entry for 7.1.0 that conflict with
the merged docs and package.json (now 8.0.10); update IMPLEMENTATION_SUMMARY.md
to remove or mark as completed the README/CHANGELOG checklist items, either
update the sample changelog entry to the actual released version and date or
delete the placeholder 7.1.0 block, and add a short note that docs were
regenerated and package.json is at 8.0.10 so the summary reflects the current
release state; also verify references to README.md and CHANGELOG.md sections
mentioned (e.g., "error event", "listeningState") are present in the regenerated
docs and adjust the summary text accordingly.
- Around line 103-105: The fenced code blocks containing state diagrams (for
example the block showing "IDLE → (start()) → STARTING → STARTED → (stop() or
results or error) → STOPPING → IDLE" and the other similar blocks referenced)
are missing language identifiers and failing markdownlint; update each of those
fenced blocks to include an appropriate language tag (e.g., ```text or
```markdown or ```bash as appropriate) so the linter recognizes them—apply this
change to the shown block and the other occurrences at the ranges called out
(292-298, 301-308, 311-317, 320-327, 333-340).
In `@ios/Sources/SpeechRecognitionPlugin/SpeechRecognitionPlugin.swift`:
- Around line 488-504: The code emits readyForNextSession and stopped before an
async SpeechAnalyzerRecognitionSession.finish completes, allowing a new start()
to race with teardown; change the flow in stopCurrentSession so that if
`#available`(iOS 26.0, *) and modernRecognitionSession is a
SpeechAnalyzerRecognitionSession and sessionAlreadyStopped is false, you await
modernSession.stop() (on the MainActor) before clearing
modernRecognitionSession, calling clearLegacyRecognitionResources(), and before
calling notifyListeners("readyForNextSession", ...) and emitListeningState(...);
in other words, move the async stop-await path to block emitting/clearing until
after await modernSession.stop(), only then set modernRecognitionSession = nil
and proceed with notifyListeners and emitListeningState (if
sessionAlreadyStopped is true you can keep the existing immediate behavior).
- Around line 113-124: The permission-denied callback inside
AVAudioSession.sharedInstance().requestRecordPermission must first verify the
sessionId matches the current active session to avoid emitting errors for
finished sessions; inside the requestRecordPermission closure (before calling
DispatchQueue.main.async and before
emitErrorEvent/activeCall/finishSessionIfNeeded) add a guard that sessionId ==
self.activeSessionId (or otherwise confirm the session is still active) and
return early if it does not match so stop()/forceStop() won't cause stale
MICROPHONE_PERMISSION_DENIED emissions.
In `@src/definitions.ts`:
- Around line 323-333: The multi-argument overload for addListener with
eventName 'readyForNextSession' is not reflowed and fails prettier; reformat
that overload signature (the addListener declaration taking eventName:
'readyForNextSession' and listenerFunc: (event: SpeechRecognitionReadyEvent) =>
void) so its parameters are wrapped across lines like the other multi-argument
overloads (each parameter on its own line, preserve the trailing comma and
return type Promise<PluginListenerHandle>) to satisfy prettier.
---
Outside diff comments:
In
`@android/src/main/java/app/capgo/speechrecognition/SpeechRecognitionPlugin.java`:
- Around line 1110-1122: The segmented-callback handlers onSegmentResults and
onEndOfSegmentedSession need the same stale-listener guard used elsewhere: at
the start of onSegmentResults(Bundle results) call isStale() and return early if
true before processing matches/notifyListeners(SEGMENT_RESULTS_EVENT), and
likewise in onEndOfSegmentedSession() call isStale() and return early before
notifyListeners(END_OF_SEGMENT_EVENT); this prevents old recognizer/session
callbacks from leaking into a new session.
- Around line 576-603: Guard all on-device support and model-download callbacks
by verifying the session is still active for the same currentSessionId before
taking actions; specifically, at the start of
RecognitionSupportCallback.onSupportResult and in the model-download callbacks
(onError, onSuccess, onScheduled) check that the sessionId matches an active
session (the same session tracking used by finishSession/currentSessionId) and
return early if it is stale, so that calls to startInlineListening(intent,...),
triggerOnDeviceModelDownload(...), call.reject(...), emitErrorEvent(...), and
finishSession(...) only run for live sessions.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 090a4222-73a6-4a76-a5fe-e4571831fa26
📒 Files selected for processing (12)
IMPLEMENTATION_SUMMARY.mdREADME.mdandroid/src/main/java/app/capgo/speechrecognition/Constants.javaandroid/src/main/java/app/capgo/speechrecognition/SpeechRecognitionPlugin.javaexample-app/android/build.gradleexample-app/index.htmlexample-app/src/main.jsexample-app/src/style.cssios/Sources/SpeechRecognitionPlugin/SpeechRecognitionPlugin.swiftpackage.jsonsrc/definitions.tssrc/web.ts
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
android/src/main/java/app/capgo/speechrecognition/SpeechRecognitionPlugin.java (1)
186-207:⚠️ Potential issue | 🟠 MajorReject overlapping
start()calls before rewriting session state.The lock block does not check whether a session is already in progress (
STARTING/STARTED/STOPPING). A secondstart()call can overwritesessionIdandactiveStartCallwhile the first one is still active, leaving the earlier session's promise unsettled.Suggested fix
try { lock.lock(); + if (state != ListeningState.IDLE) { + call.reject("Speech recognition is already running."); + return; + } cancelPendingForceStopLocked(); forceStopped = false;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@android/src/main/java/app/capgo/speechrecognition/SpeechRecognitionPlugin.java` around lines 186 - 207, Before mutating session state inside the locked block in start() (the code that updates sessionId, activeStartCall, state, etc.), check if state is already one of ListeningState.STARTING, STARTED, or STOPPING and reject the new start call instead of proceeding; do this while holding lock to avoid races. Concretely, in SpeechRecognitionPlugin.start() (the block that uses lock.lock()/lock.unlock()), add an early guarded branch that if state==STARTING||state==STARTED||state==STOPPING returns/throws a clear error to the caller (or completes the incoming start promise/error via activeStartCall) and does not modify sessionId or activeStartCall; only when state is idle should you increment sessionId, assign currentSessionId, set activeStartCall, and set state=ListeningState.STARTING. Ensure the rejection uses the same error-path mechanism the plugin uses for other errors so the original session’s promise remains intact.
🧹 Nitpick comments (1)
ios/Sources/SpeechRecognitionPlugin/SpeechAnalyzerRecognitionSession.swift (1)
415-431: Keep the fallback session surface in sync with the real one.The 6.2 branch exposes
isRunning, but this stub doesn’t. That kind of drift makes it easy to add a caller that works on newer toolchains and then breaks only on the fallback branch. I’d either add the missing members here now or extract a tiny shared protocol/base surface.♻️ Minimal parity fix
`@MainActor` final class SpeechAnalyzerRecognitionSession: NSObject { typealias ResultHandler = `@MainActor` ([String], Bool) -> Void typealias VoidHandler = `@MainActor` () -> Void typealias ErrorHandler = `@MainActor` (Error) -> Void + var isRunning: Bool { false } + var onListeningStarted: VoidHandler? var onListeningStopped: VoidHandler? var onResult: ResultHandler? var onError: ErrorHandler?🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ios/Sources/SpeechRecognitionPlugin/SpeechAnalyzerRecognitionSession.swift` around lines 415 - 431, Add a public isRunning Bool property to the SpeechAnalyzerRecognitionSession stub so its surface matches the real session: declare var isRunning: Bool = false in the class and ensure start() sets isRunning = true (or true on successful start) and stop() sets isRunning = false; reference SpeechAnalyzerRecognitionSession, start(), and stop() so callers using isRunning on newer toolchains won't break the fallback implementation.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@android/src/main/java/app/capgo/speechrecognition/SpeechRecognitionPlugin.java`:
- Around line 706-757: finishSession currently calls startCallToReject.reject()
while still holding lock, which risks deadlock; change the flow so you only
capture the PluginCall into a local (startCallToReject) while locked, perform
all state mutation and call lock.unlock() inside the handler, then after
unlocking (outside the locked section) call startCallToReject.reject(...). Refer
to finishSession, startCallToReject, lock.lock()/lock.unlock(), and the
reject(...) call and ensure any notifyListeners/emitListeningState calls that
must run under the lock remain inside while the reject(...) invocation happens
after the lock is released.
- Around line 1086-1104: The JSONArray comparison using
previousPartialResults.equals(nextPartialResults) is incorrect because JSONArray
doesn't override equals; update the change-detection to compare content (e.g.,
previousPartialResults.toString().equals(nextPartialResults.toString()) or use
previousPartialResults.similar(nextPartialResults)) so duplicate partial-result
events are suppressed; modify the block that builds nextPartialResults (variable
nextPartialResults) and the conditional that sets previousPartialResults and
payload in SpeechRecognitionPlugin (the code surrounding previousPartialResults,
nextPartialResults, payload, and the notifyListeners(PARTIAL_RESULTS_EVENT,
payload) call) to use a content-aware comparison and only update
previousPartialResults when the content differs.
---
Duplicate comments:
In
`@android/src/main/java/app/capgo/speechrecognition/SpeechRecognitionPlugin.java`:
- Around line 186-207: Before mutating session state inside the locked block in
start() (the code that updates sessionId, activeStartCall, state, etc.), check
if state is already one of ListeningState.STARTING, STARTED, or STOPPING and
reject the new start call instead of proceeding; do this while holding lock to
avoid races. Concretely, in SpeechRecognitionPlugin.start() (the block that uses
lock.lock()/lock.unlock()), add an early guarded branch that if
state==STARTING||state==STARTED||state==STOPPING returns/throws a clear error to
the caller (or completes the incoming start promise/error via activeStartCall)
and does not modify sessionId or activeStartCall; only when state is idle should
you increment sessionId, assign currentSessionId, set activeStartCall, and set
state=ListeningState.STARTING. Ensure the rejection uses the same error-path
mechanism the plugin uses for other errors so the original session’s promise
remains intact.
---
Nitpick comments:
In `@ios/Sources/SpeechRecognitionPlugin/SpeechAnalyzerRecognitionSession.swift`:
- Around line 415-431: Add a public isRunning Bool property to the
SpeechAnalyzerRecognitionSession stub so its surface matches the real session:
declare var isRunning: Bool = false in the class and ensure start() sets
isRunning = true (or true on successful start) and stop() sets isRunning =
false; reference SpeechAnalyzerRecognitionSession, start(), and stop() so
callers using isRunning on newer toolchains won't break the fallback
implementation.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 41d1d7d9-80d7-4e8c-af5c-bf2938f5b964
📒 Files selected for processing (3)
android/src/main/java/app/capgo/speechrecognition/SpeechRecognitionPlugin.javaios/Sources/SpeechRecognitionPlugin/SpeechAnalyzerRecognitionSession.swiftsrc/definitions.ts
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
android/src/main/java/app/capgo/speechrecognition/SpeechRecognitionPlugin.java (2)
1108-1119:⚠️ Potential issue | 🟠 MajorApply
isStale()to segmented-session callbacks too.
onSegmentResults()andonEndOfSegmentedSession()are the only listener callbacks still forwarding events without the session/generation guard. Late callbacks from a destroyed recognizer can leak segment events into the next session.💡 Suggested fix
`@Override` public void onSegmentResults(Bundle results) { + if (isStale()) { + return; + } ArrayList<String> matches = results.getStringArrayList(SpeechRecognizer.RESULTS_RECOGNITION); if (matches == null) { return; } notifyListeners(SEGMENT_RESULTS_EVENT, new JSObject().put("matches", new JSArray(matches))); } `@Override` public void onEndOfSegmentedSession() { + if (isStale()) { + return; + } notifyListeners(END_OF_SEGMENT_EVENT, new JSObject()); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@android/src/main/java/app/capgo/speechrecognition/SpeechRecognitionPlugin.java` around lines 1108 - 1119, onSegmentResults and onEndOfSegmentedSession currently forward segment events unguarded, which can let late callbacks from a destroyed recognizer leak into a new session; update both methods to check the session/generation guard by calling isStale() and returning early if true before calling notifyListeners for SEGMENT_RESULTS_EVENT and END_OF_SEGMENT_EVENT so only live sessions emit events.
576-620:⚠️ Potential issue | 🔴 CriticalGuard support/model-download callbacks against stale sessions.
These async callbacks can arrive after
stop(),forceStop(), or a recognizer rebuild, but they go straight intostartInlineListening(...),call.reject(...), andemitErrorEvent(...)without re-checking staleness. BecausesessionIdis only advanced on the nextstart(), an old callback can still restart listening or surface an error after the user already stopped.Also applies to: 623-664
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@android/src/main/java/app/capgo/speechrecognition/SpeechRecognitionPlugin.java` around lines 576 - 620, The async RecognitionSupportCallback handlers must guard against stale sessions: capture the currentSessionId into a final/local variable before calling speechRecognizer.checkRecognitionSupport(...) (or otherwise call a helper like isSessionActive(sessionId)), and in both onSupportResult(...) and onError(...) verify that the captured sessionId still equals the live currentSessionId (or that isSessionActive returns true) before calling startInlineListening, triggerOnDeviceModelDownload, call.reject, emitErrorEvent, or finishSession; if stale, simply return without side effects. Apply the same guard pattern to the analogous callbacks around the other block referenced (the second callback range) so no old async result can restart listening or surface errors for a stopped/rebuilt session.
♻️ Duplicate comments (2)
android/src/main/java/app/capgo/speechrecognition/SpeechRecognitionPlugin.java (2)
186-205:⚠️ Potential issue | 🔴 CriticalReject overlapping
start()calls before rewriting session state.This block still has no
IDLEcheck. A secondstart()overwritessessionId,state, and the cached request options while the first startup is still in flight, so the earlierbeginListening(...)path goes stale and its promise can be orphaned.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@android/src/main/java/app/capgo/speechrecognition/SpeechRecognitionPlugin.java` around lines 186 - 205, Before mutating session state in the start() path, guard against overlapping calls by checking the current listening state while holding the same lock: inside the lock.lock() section (where you call cancelPendingForceStopLocked(), set forceStopped/pendingStopReason/resetPartialResultsCache(), and before incrementing sessionId or setting state = ListeningState.STARTING), verify that state == ListeningState.IDLE (or otherwise not already STARTING/ACTIVE) and immediately reject/return the start request (e.g., fail the activeStartCall or throw) if it is not IDLE; this prevents a second start() from overwriting sessionId, state, cached options (lastLanguage/lastMaxResults/lastPrompt/etc.), and orphaning the in-flight beginListening(...) promise. Ensure the check is performed while holding the lock and before any assignment to sessionId, state, or activeStartCall.
241-279:⚠️ Potential issue | 🟠 MajorDon't emit terminal session events for popup mode until the activity returns.
Sessions launched with
startActivityForResult(...)are still driven toreadyForNextSession/stoppedby the stop timers here. If the popup returns later,listeningResult(...)resolves or rejects an already-finished session and can emit a second terminal cycle.Also applies to: 282-337
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@android/src/main/java/app/capgo/speechrecognition/SpeechRecognitionPlugin.java` around lines 241 - 279, The stop method is emitting terminal session events and scheduling finish timers even when a popup activity was launched (startActivityForResult), causing duplicate terminal cycles when the popup returns; modify stop(PluginCall) and related logic (use the existing sessionId, pendingStopReason, scheduleFinishFallbackLocked, emitListeningState, listening(false), cancelPendingForceStopLocked) to detect an outstanding activity-for-result/popup-return pending state (e.g., an isAwaitingActivityResult or isPopupMode flag) and if that flag is set, do not transition state to STOPPING, do not call emitListeningState("stoppingListening", ...), and do not schedule the finish fallback; instead set pendingStopReason and defer calling listening(false) and scheduleFinishFallbackLocked until the activity result handler clears the awaiting flag (where currently listeningResult(...) runs), at which point perform the terminal transitions and resolve/reject the session once. Ensure cancelPendingForceStopLocked still runs as appropriate but avoid emitting terminal events while awaiting the activity result.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@ios/Sources/SpeechRecognitionPlugin/SpeechRecognitionPlugin.swift`:
- Around line 569-576: The helper rejectPendingStartCallIfNeeded currently skips
rejecting when currentOptions?.partialResults == true, leaving the original JS
start() promise unresolved; change its logic so it always clears activeCall and
rejects the pending startCall (call startCall.reject(message) and set activeCall
= nil) regardless of partialResults, and ensure callers like stop(), forceStop()
(and any code paths that win during permission prompt/startup such as
beginRecognition) invoke rejectPendingStartCallIfNeeded(message:) so stale
sessions are settled.
In `@src/definitions.ts`:
- Around line 78-95: The interface SpeechRecognitionPartialResultEvent declares
matches as required but native forceStop() can emit payloads without matches;
update the type to reflect runtime by making matches optional (e.g., change
matches: string[] to matches?: string[]) and adjust any related JSDoc/comments
to note that matches may be undefined for forced/accumulated-only events so
callers should guard before using event.matches; ensure this change is made in
src/definitions.ts where the interface is declared.
---
Outside diff comments:
In
`@android/src/main/java/app/capgo/speechrecognition/SpeechRecognitionPlugin.java`:
- Around line 1108-1119: onSegmentResults and onEndOfSegmentedSession currently
forward segment events unguarded, which can let late callbacks from a destroyed
recognizer leak into a new session; update both methods to check the
session/generation guard by calling isStale() and returning early if true before
calling notifyListeners for SEGMENT_RESULTS_EVENT and END_OF_SEGMENT_EVENT so
only live sessions emit events.
- Around line 576-620: The async RecognitionSupportCallback handlers must guard
against stale sessions: capture the currentSessionId into a final/local variable
before calling speechRecognizer.checkRecognitionSupport(...) (or otherwise call
a helper like isSessionActive(sessionId)), and in both onSupportResult(...) and
onError(...) verify that the captured sessionId still equals the live
currentSessionId (or that isSessionActive returns true) before calling
startInlineListening, triggerOnDeviceModelDownload, call.reject, emitErrorEvent,
or finishSession; if stale, simply return without side effects. Apply the same
guard pattern to the analogous callbacks around the other block referenced (the
second callback range) so no old async result can restart listening or surface
errors for a stopped/rebuilt session.
---
Duplicate comments:
In
`@android/src/main/java/app/capgo/speechrecognition/SpeechRecognitionPlugin.java`:
- Around line 186-205: Before mutating session state in the start() path, guard
against overlapping calls by checking the current listening state while holding
the same lock: inside the lock.lock() section (where you call
cancelPendingForceStopLocked(), set
forceStopped/pendingStopReason/resetPartialResultsCache(), and before
incrementing sessionId or setting state = ListeningState.STARTING), verify that
state == ListeningState.IDLE (or otherwise not already STARTING/ACTIVE) and
immediately reject/return the start request (e.g., fail the activeStartCall or
throw) if it is not IDLE; this prevents a second start() from overwriting
sessionId, state, cached options (lastLanguage/lastMaxResults/lastPrompt/etc.),
and orphaning the in-flight beginListening(...) promise. Ensure the check is
performed while holding the lock and before any assignment to sessionId, state,
or activeStartCall.
- Around line 241-279: The stop method is emitting terminal session events and
scheduling finish timers even when a popup activity was launched
(startActivityForResult), causing duplicate terminal cycles when the popup
returns; modify stop(PluginCall) and related logic (use the existing sessionId,
pendingStopReason, scheduleFinishFallbackLocked, emitListeningState,
listening(false), cancelPendingForceStopLocked) to detect an outstanding
activity-for-result/popup-return pending state (e.g., an
isAwaitingActivityResult or isPopupMode flag) and if that flag is set, do not
transition state to STOPPING, do not call
emitListeningState("stoppingListening", ...), and do not schedule the finish
fallback; instead set pendingStopReason and defer calling listening(false) and
scheduleFinishFallbackLocked until the activity result handler clears the
awaiting flag (where currently listeningResult(...) runs), at which point
perform the terminal transitions and resolve/reject the session once. Ensure
cancelPendingForceStopLocked still runs as appropriate but avoid emitting
terminal events while awaiting the activity result.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: b4ca23cb-dc50-437f-987a-a39b31019576
📒 Files selected for processing (4)
README.mdandroid/src/main/java/app/capgo/speechrecognition/SpeechRecognitionPlugin.javaios/Sources/SpeechRecognitionPlugin/SpeechRecognitionPlugin.swiftsrc/definitions.ts
There was a problem hiding this comment.
🧹 Nitpick comments (1)
ios/Sources/SpeechRecognitionPlugin/SpeechAnalyzerRecognitionSession.swift (1)
5-5: Please verify this gate against the Xcode/SDK matrix, not just the compiler version.Line 5 uses
#if compiler(>=6.2), which only keys off the Swift compiler version.SpeechAnalyzer/SpeechTranscriberare SDK-provided Speech framework symbols, so this is still only a proxy for “has the iOS 26 Speech APIs”; if you care about older/newer Xcode+SDK mixes, the modern branch can still be selected when the symbols you need are not. Since Lines 390-437 exist specifically for that compatibility story, I’d add an older-Xcode build to CI, or gate this at the build level, so the fallback path is actually exercised. (developer.apple.com)Also applies to: 390-437
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 59cb8c39-1854-4017-85f7-5129cd231e33
📒 Files selected for processing (3)
android/src/main/java/app/capgo/speechrecognition/SpeechRecognitionPlugin.javaexample-app/src/main.jsios/Sources/SpeechRecognitionPlugin/SpeechAnalyzerRecognitionSession.swift
🚧 Files skipped from review as they are similar to previous changes (2)
- example-app/src/main.js
- android/src/main/java/app/capgo/speechrecognition/SpeechRecognitionPlugin.java
Summary
forceStop(),getLastPartialResult(),setPTTState(),error,readyForNextSession, and richerlisteningStateevents are available there tooNotes
continuousPTTauto-restart remains Android-onlyVerification
bun run buildbun run verify:iosbun run verify:androidSummary by CodeRabbit
New Features
Documentation
Example App