feat: merge PRs #2, #4, and #6 with completed iOS support by riderx · Pull Request #7 · Cap-go/capacitor-speech-recognition

riderx · 2026-03-24T12:11:45Z

Summary

merge the local work from PR refactor: improve speech recognizer management and error handling #2, PR feat(android): add PTT mode with forceStop and continuousPTT #4, and PR feat(android): implement deterministic state machine for speech recog… #6 into one branch
keep the newer opt-in on-device recognition path while also bringing in PTT APIs and the deterministic Android session lifecycle
implement the missing iOS side of the new public API so forceStop(), getLastPartialResult(), setPTTState(), error, readyForNextSession, and richer listeningState events are available there too
regenerate the README/API docs and keep the implementation summary from PR feat(android): implement deterministic state machine for speech recog… #6

Notes

continuousPTT auto-restart remains Android-only
iOS now supports the same PTT-oriented API surface for hold/release flows, but not automatic silence restarts

Verification

bun run build
bun run verify:ios
bun run verify:android

Summary by CodeRabbit

New Features
- Push-to-talk (PTT) hold-to-record with accumulated transcript display and experimental continuousPTT to keep sessions alive across silence
- New public methods: forceStop(), getLastPartialResult(), setPTTState()
- New events: error and readyForNextSession; richer listening and partial result payloads (sessionId, accumulated, forced, etc.)
Documentation
- Updated docs for PTT flow, continuousPTT option, and streaming partialResults behavior until session end
Example App
- Added PTT UI, controls, and styles for demo usage

…izer reset

The Android SpeechRecognizer.stopListening() method doesn't reliably stop audio input in all scenarios. This is problematic for Push-to-Talk (PTT) implementations where the user expects recording to stop immediately when releasing the button. This commit adds: 1. forceStop() method with destroy/recreate pattern: - First tries graceful stopListening() - After configurable timeout (default 1500ms), forces stop by destroying and recreating the SpeechRecognizer - Returns the last cached partial result so no speech is lost - Emits 'stopped' state after force stop completes 2. getLastPartialResult() method: - Returns the most recent partial transcription - Useful for checking state or retrieving results after force stop 3. Callback guards: - onError, onResults, onPartialResults now check forceStopped flag - Prevents late callbacks from interfering after force stop - onResults cancels pending force-stop timeout when results arrive Usage example: ```javascript // Start with partial results for real-time updates await SpeechRecognition.start({ partialResults: true }); // On button release, force stop with 1s timeout await SpeechRecognition.forceStop({ timeout: 1000 }); // Get what was heard (in case normal results didn't fire) const result = await SpeechRecognition.getLastPartialResult(); ``` Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…tal) EXPERIMENTAL: This feature allows recognition to continue through silence periods while the PTT button is held, useful for natural speech with pauses. This commit adds: 1. continuousPTT option in start(): - When enabled, recognition auto-restarts on silence/timeout errors - Results are accumulated across restarts - Requires setPTTState() to track button state 2. setPTTState() method: - Call with held=true on button press - Call with held=false on button release - Resets accumulated results when button is pressed 3. Auto-restart logic: - onError: Restarts on ERROR_NO_MATCH or ERROR_SPEECH_TIMEOUT while button is held - onResults: Accumulates result and restarts while button is held - Emits 'isRestarting' flag in partial results when restarting 4. Accumulated results: - Results from multiple recognition sessions are concatenated - Final result includes 'accumulatedText' with full transcription Usage example: ```javascript // Start with continuous PTT enabled await SpeechRecognition.start({ partialResults: true, continuousPTT: true }); // On button press await SpeechRecognition.setPTTState({ held: true }); // ... user speaks with pauses, recognition continues ... // On button release await SpeechRecognition.setPTTState({ held: false }); await SpeechRecognition.forceStop(); // Final result includes all accumulated speech ``` WARNING: This is experimental and may cause issues in some scenarios. Use forceStop() without continuousPTT for reliable basic PTT. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Text nodes don't have classList property, causing TypeError when firstChild is whitespace. Add optional chaining to prevent the error. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add interactive PTT demonstration to the example app: - Toggle switch to enable PTT mode (hides regular Start/Stop buttons) - Checkbox to enable continuousPTT (continue on silence, experimental) - Press-and-hold microphone button with visual feedback - Display accumulated results from continuous PTT sessions - Add web.ts stubs for forceStop, getLastPartialResult, setPTTState - Fix Kotlin version conflict in example-app Android build Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

When continuousPTT mode auto-restarts after silence, beginListening is called with null PluginCall. Add null checks to prevent NPE crash. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

When user releases PTT (scheduling the force-stop timeout) and quickly starts a new session, the old timeout could still fire and destroy the fresh recognizer. This fix cancels any pending forceStopRunnable before starting a new recognition session. Addresses CodeRabbit review feedback. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add kotlin-gradle-plugin:1.9.24 to buildscript dependencies to match the forced stdlib versions in resolutionStrategy. This ensures KGP version consistency and avoids AGP/KGP version mismatches. Addresses CodeRabbit review feedback. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Update JSDoc to accurately describe that forceStop() returns Promise<void> and emits partial results via event. Users should call getLastPartialResult() to retrieve transcription after force stop. Addresses CodeRabbit review feedback. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

When partialResults=false, start() returns a Promise that awaits resolution from onResults()/onError(). If forceStop() timeout triggers before normal completion, forceStopped=true caused both callbacks to return early without resolving the pending PluginCall. Now tracks activeStartCall and rejects it with "forceStop" error in the timeout handler, preventing callers from hanging indefinitely. Addresses CodeRabbit review feedback. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

The TypeScript interface only defined `matches`, but the Android implementation emits additional fields in PTT scenarios: - accumulated: transcription across continuous PTT restarts - accumulatedText: final accumulated text including current result - isRestarting: true when restarting in continuous PTT mode - forced: true when result emitted due to force stop timeout Addresses CodeRabbit review feedback. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Mark forceStop(), getLastPartialResult(), and setPTTState() as Android-only APIs since there are no corresponding iOS implementations. This helps TypeScript users and documentation generators clearly indicate these methods are unavailable on iOS. Addresses CodeRabbit review feedback. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…nition - Introduced a finite state machine to manage listening sessions, improving predictability and observability. - Emitted events for state transitions: `startingListening`, `started`, `stoppingListening`, and `stopped`. - Enhanced error handling by emitting an `error` event for all recognition errors. - Added `readyForNextSession` event to signal when the recognizer is ready for a new session. - Maintained backward compatibility while improving session management and UI responsiveness.

# Conflicts: # android/src/main/java/app/capgo/speechrecognition/SpeechRecognitionPlugin.java

# Conflicts: # README.md # android/src/main/java/app/capgo/speechrecognition/SpeechRecognitionPlugin.java

# Conflicts: # android/src/main/java/app/capgo/speechrecognition/SpeechRecognitionPlugin.java # src/definitions.ts

chatgpt-codex-connector · 2026-03-24T12:11:52Z

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, add credits to your account and enable them for code reviews in your settings.

coderabbitai · 2026-03-24T12:12:04Z

📝 Walkthrough

Walkthrough

Added push-to-talk session management: three new public plugin APIs (setPTTState, forceStop, getLastPartialResult), session IDs and session-aware lifecycle across Android and iOS, continuous-PTT restart behavior, expanded event payloads (accumulated transcript, session metadata, error/ready events), and example app PTT UI.

Changes

Cohort / File(s)	Summary
TypeScript API + Web `src/definitions.ts`, `src/web.ts`	Added `forceStop`, `getLastPartialResult`, `setPTTState` to plugin API and types (`ForceStopOptions`, `LastPartialResult`, `PTTStateOptions`); added `continuousPTT` option; expanded partial result and listening event payloads; added `error` and `readyForNextSession` listener overloads. Web implementation provides stubs/safe default for new methods.
Android native `android/src/main/java/app/capgo/speechrecognition/...Constants.java`, `.../SpeechRecognitionPlugin.java`	Incremented plugin version; added event name constants; rewrote plugin to session-aware lifecycle with monotonic `sessionId`, `recognizerGeneration`, `ListeningState`, locks/handlers; implemented `forceStop`, `getLastPartialResult`, `setPTTState`; continuous-PTT restart logic; structured `ERROR_EVENT`/`READY_FOR_NEXT_SESSION_EVENT` emissions; consolidated session finish/cleanup.
iOS native `ios/Sources/SpeechRecognitionPlugin/SpeechRecognitionPlugin.swift`, `ios/Sources/SpeechRecognitionPlugin/SpeechAnalyzerRecognitionSession.swift`	Introduced session IDs, per-session state, `ListeningReason`; reworked `start`/`stop` to be session-aware; added `forceStop`, `getLastPartialResult`, `setPTTState`; gated callbacks by sessionId; centralized session finalization and continuous-PTT restart scheduling; added compiler-guarded stub branch for older compilers / iOS <26.
Example app (UI & logic) `example-app/index.html`, `example-app/src/main.js`, `example-app/src/style.css`	Added PTT-mode UI (toggle, hold-to-record button, accumulated-results display, hidden PTT status); implemented PTT press/release flow, PTT state calls, start/forceStop orchestration, accumulated-results handling, and ready-for-next-session coordination; added styles for PTT controls.
Build / Config `package.json`, `example-app/android/build.gradle`	Added `prepare` npm script to run build; added Kotlin Gradle plugin classpath and enforced Kotlin stdlib version 1.9.24 in example Android buildscript.
Docs `README.md`	Added "Push-to-talk and session events" section documenting new APIs (`setPTTState`, `forceStop`, `getLastPartialResult`), `continuousPTT` behavior, expanded event payloads and new event types.

Sequence Diagram(s)

sequenceDiagram
    participant UI as Example App UI
    participant JS as Plugin JS
    participant Native as Native Plugin (Android/iOS)
    participant Recognizer as Native Recognizer

    rect rgba(200,230,255,0.5)
    UI->>JS: handlePTTPress()
    JS->>Native: setPTTState(held=true)
    JS->>Native: start(options + continuousPTT?)
    end

    rect rgba(200,255,200,0.5)
    Native->>Recognizer: create/start session(sessionId)
    Recognizer-->>Native: partial results (matches)
    Native-->>JS: partialResults (matches, accumulated?, sessionId)
    JS-->>UI: update accumulated display
    end

    rect rgba(255,230,200,0.5)
    UI->>JS: handlePTTRelease()
    JS->>Native: setPTTState(held=false)
    JS->>Native: forceStop(timeout)
    Native->>Recognizer: stop/teardown
    Native-->>JS: optional forced partial (forced=true)
    Native-->>JS: readyForNextSession(sessionId)
    JS-->>UI: fetch getLastPartialResult() / finalize UI
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 I press, I hold, the session hums along,
Words gather softly into one long song,
A forced little stop, a final tiny part,
Restarts keep listening — PTT warms my heart,
The rabbit nods: sessions now finish smart.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The pull request title clearly and accurately describes the main change: merging multiple PRs with completed iOS support for a speech recognition plugin.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch codex/merge-pr-4-6

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 10

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

android/src/main/java/app/capgo/speechrecognition/SpeechRecognitionPlugin.java (2)

1110-1122: ⚠️ Potential issue | 🟠 Major

Apply the stale-listener guard to segmented callbacks too.

Late onSegmentResults() / onEndOfSegmentedSession() callbacks are still emitted after a session or recognizer-generation switch because only the other listener methods call isStale(). That leaks segment events from an old recognizer into the next session.

Suggested fix

         `@Override`
         public void onSegmentResults(Bundle results) {
+            if (isStale()) {
+                return;
+            }
             ArrayList<String> matches = results.getStringArrayList(SpeechRecognizer.RESULTS_RECOGNITION);
             if (matches == null) {
                 return;
             }
             notifyListeners(SEGMENT_RESULTS_EVENT, new JSObject().put("matches", new JSArray(matches)));
         }

         `@Override`
         public void onEndOfSegmentedSession() {
+            if (isStale()) {
+                return;
+            }
             notifyListeners(END_OF_SEGMENT_EVENT, new JSObject());
         }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In
`@android/src/main/java/app/capgo/speechrecognition/SpeechRecognitionPlugin.java`
around lines 1110 - 1122, The segmented-callback handlers onSegmentResults and
onEndOfSegmentedSession need the same stale-listener guard used elsewhere: at
the start of onSegmentResults(Bundle results) call isStale() and return early if
true before processing matches/notifyListeners(SEGMENT_RESULTS_EVENT), and
likewise in onEndOfSegmentedSession() call isStale() and return early before
notifyListeners(END_OF_SEGMENT_EVENT); this prevents old recognizer/session
callbacks from leaking into a new session.

576-603: ⚠️ Potential issue | 🔴 Critical

Session-check the on-device support/model-download callbacks.

These callbacks call startInlineListening() / call.reject() without verifying that currentSessionId is still live. If the user stops or restarts while support checking or model download is in flight, a stale callback can revive a dead session or reject a newer flow.

Suggested guard pattern

                 `@Override`
                 public void onSupportResult(RecognitionSupport support) {
+                    try {
+                        lock.lock();
+                        if (currentSessionId != sessionId || pendingStopReason != null || state == ListeningState.IDLE) {
+                            return;
+                        }
+                    } finally {
+                        lock.unlock();
+                    }
                     boolean installed = isLanguageSupported(language, support.getInstalledOnDeviceLanguages());
                     boolean supported =
                         installed ||

Apply the same guard before acting in onError(...), onSuccess(), onScheduled(), and the model-download onError(...) path.

Also applies to: 605-614, 619-656

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In
`@android/src/main/java/app/capgo/speechrecognition/SpeechRecognitionPlugin.java`
around lines 576 - 603, Guard all on-device support and model-download callbacks
by verifying the session is still active for the same currentSessionId before
taking actions; specifically, at the start of
RecognitionSupportCallback.onSupportResult and in the model-download callbacks
(onError, onSuccess, onScheduled) check that the sessionId matches an active
session (the same session tracking used by finishSession/currentSessionId) and
return early if it is stale, so that calls to startInlineListening(intent,...),
triggerOnDeviceModelDownload(...), call.reject(...), emitErrorEvent(...), and
finishSession(...) only run for live sessions.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@android/src/main/java/app/capgo/speechrecognition/SpeechRecognitionPlugin.java`:
- Around line 241-279: stop()/forceStop() prematurely mark popup-mode sessions
finished even when an external activity started by startActivityForResult(...)
is still outstanding; update stop() and forceStop() to detect popup/modal
sessions started via startActivityForResult (check the same flag/field used when
launching the activity), cancel any scheduled finish fallbacks (call
cancelPendingForceStopLocked()), and instead defer emitting terminal events and
calling scheduleFinishFallbackLocked until the activity result returns;
additionally, when stopping you should attempt to finish/cancel the launched
activity (call Activity.finish() or use the saved PendingIntent/result handler)
so the popup doesn't outlive the session, and ensure listeningResult(...) only
resolves/rejects after the activity result is processed and
emitListeningState(..., "finished", ...) is only emitted after that.
- Around line 186-205: Before mutating session state in start(), check the
current state under the same lock and reject overlapping starts: if state is
ListeningState.STARTING, STARTED, or STOPPING, immediately fail/return the new
start call (e.g., complete its promise with an error) instead of proceeding;
only when state is not one of those values continue to increment sessionId, set
currentSessionId, assign activeStartCall, and set state =
ListeningState.STARTING. Ensure this check uses the same lock around the block
that touches sessionId, activeStartCall, and state so beginListening() and
subsequent code (e.g., beginListening()) will not have its promise orphaned by
an overwritten session.

In `@example-app/src/main.js`:
- Around line 96-104: In togglePTTMode, only enable the continuousPTT checkbox
on Android: detect platform (use the existing platform helper or add an
isAndroid check) and change the logic so refs.continuousPTT.disabled becomes
!(enabled && isAndroid) and refs.continuousPTT.checked is forced false when not
(enabled && isAndroid); update the control visibility/behavior in togglePTTMode
to honor this Android-only requirement while leaving other UI toggles unchanged.
- Around line 233-250: The PTT control only has mouse/touch handlers and lacks
keyboard support, so update the refs.pttButton wiring to also handle key events
and blur: add a keydown listener that calls handlePTTPress() when Space or Enter
is pressed (preventDefault to avoid scrolling on Space), a keyup listener that
calls handlePTTRelease() for those keys, and a blur listener that calls
handlePTTRelease() as a safety release; ensure existing touch/mouse handlers
(handlePTTPress, handlePTTRelease) are reused and keep
refs.pttMode/togglePTTMode unchanged.
- Around line 184-223: handlePTTPress and handlePTTRelease can interleave and
race; serialize them by adding a transition lock/token (e.g., pttTransitionId or
an async mutex) that each press/release captures and validates before proceeding
so stale async work is ignored. On entry to handlePTTPress and handlePTTRelease
acquire the token (or increment a session id), store it locally, and before any
async continuation (after await SpeechRecognition.setPTTState, await
SpeechRecognition.start, await SpeechRecognition.forceStop, await
SpeechRecognition.getLastPartialResult, or any other await) check the token
still matches; if not, abort that stale branch. Also ensure handlePTTPress does
not start a new session while a previous transition is in progress and only
allow a new press once forceStop or a readyForNextSession signal completes. Use
the existing symbols: handlePTTPress, handlePTTRelease,
SpeechRecognition.setPTTState, SpeechRecognition.start,
SpeechRecognition.forceStop, and SpeechRecognition.getLastPartialResult to
locate and instrument the logic.

In `@IMPLEMENTATION_SUMMARY.md`:
- Around line 464-489: The implementation summary still lists pending
README/CHANGELOG tasks and a sample CHANGELOG entry for 7.1.0 that conflict with
the merged docs and package.json (now 8.0.10); update IMPLEMENTATION_SUMMARY.md
to remove or mark as completed the README/CHANGELOG checklist items, either
update the sample changelog entry to the actual released version and date or
delete the placeholder 7.1.0 block, and add a short note that docs were
regenerated and package.json is at 8.0.10 so the summary reflects the current
release state; also verify references to README.md and CHANGELOG.md sections
mentioned (e.g., "error event", "listeningState") are present in the regenerated
docs and adjust the summary text accordingly.
- Around line 103-105: The fenced code blocks containing state diagrams (for
example the block showing "IDLE → (start()) → STARTING → STARTED → (stop() or
results or error) → STOPPING → IDLE" and the other similar blocks referenced)
are missing language identifiers and failing markdownlint; update each of those
fenced blocks to include an appropriate language tag (e.g., ```text or
```markdown or ```bash as appropriate) so the linter recognizes them—apply this
change to the shown block and the other occurrences at the ranges called out
(292-298, 301-308, 311-317, 320-327, 333-340).

In `@ios/Sources/SpeechRecognitionPlugin/SpeechRecognitionPlugin.swift`:
- Around line 488-504: The code emits readyForNextSession and stopped before an
async SpeechAnalyzerRecognitionSession.finish completes, allowing a new start()
to race with teardown; change the flow in stopCurrentSession so that if
`#available`(iOS 26.0, *) and modernRecognitionSession is a
SpeechAnalyzerRecognitionSession and sessionAlreadyStopped is false, you await
modernSession.stop() (on the MainActor) before clearing
modernRecognitionSession, calling clearLegacyRecognitionResources(), and before
calling notifyListeners("readyForNextSession", ...) and emitListeningState(...);
in other words, move the async stop-await path to block emitting/clearing until
after await modernSession.stop(), only then set modernRecognitionSession = nil
and proceed with notifyListeners and emitListeningState (if
sessionAlreadyStopped is true you can keep the existing immediate behavior).
- Around line 113-124: The permission-denied callback inside
AVAudioSession.sharedInstance().requestRecordPermission must first verify the
sessionId matches the current active session to avoid emitting errors for
finished sessions; inside the requestRecordPermission closure (before calling
DispatchQueue.main.async and before
emitErrorEvent/activeCall/finishSessionIfNeeded) add a guard that sessionId ==
self.activeSessionId (or otherwise confirm the session is still active) and
return early if it does not match so stop()/forceStop() won't cause stale
MICROPHONE_PERMISSION_DENIED emissions.

In `@src/definitions.ts`:
- Around line 323-333: The multi-argument overload for addListener with
eventName 'readyForNextSession' is not reflowed and fails prettier; reformat
that overload signature (the addListener declaration taking eventName:
'readyForNextSession' and listenerFunc: (event: SpeechRecognitionReadyEvent) =>
void) so its parameters are wrapped across lines like the other multi-argument
overloads (each parameter on its own line, preserve the trailing comma and
return type Promise<PluginListenerHandle>) to satisfy prettier.

---

Outside diff comments:
In
`@android/src/main/java/app/capgo/speechrecognition/SpeechRecognitionPlugin.java`:
- Around line 1110-1122: The segmented-callback handlers onSegmentResults and
onEndOfSegmentedSession need the same stale-listener guard used elsewhere: at
the start of onSegmentResults(Bundle results) call isStale() and return early if
true before processing matches/notifyListeners(SEGMENT_RESULTS_EVENT), and
likewise in onEndOfSegmentedSession() call isStale() and return early before
notifyListeners(END_OF_SEGMENT_EVENT); this prevents old recognizer/session
callbacks from leaking into a new session.
- Around line 576-603: Guard all on-device support and model-download callbacks
by verifying the session is still active for the same currentSessionId before
taking actions; specifically, at the start of
RecognitionSupportCallback.onSupportResult and in the model-download callbacks
(onError, onSuccess, onScheduled) check that the sessionId matches an active
session (the same session tracking used by finishSession/currentSessionId) and
return early if it is stale, so that calls to startInlineListening(intent,...),
triggerOnDeviceModelDownload(...), call.reject(...), emitErrorEvent(...), and
finishSession(...) only run for live sessions.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 090a4222-73a6-4a76-a5fe-e4571831fa26

📥 Commits

Reviewing files that changed from the base of the PR and between 877e39a and 2c9a48a.

📒 Files selected for processing (12)

IMPLEMENTATION_SUMMARY.md
README.md
android/src/main/java/app/capgo/speechrecognition/Constants.java
android/src/main/java/app/capgo/speechrecognition/SpeechRecognitionPlugin.java
example-app/android/build.gradle
example-app/index.html
example-app/src/main.js
example-app/src/style.css
ios/Sources/SpeechRecognitionPlugin/SpeechRecognitionPlugin.swift
package.json
src/definitions.ts
src/web.ts

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (1)

android/src/main/java/app/capgo/speechrecognition/SpeechRecognitionPlugin.java (1)

186-207: ⚠️ Potential issue | 🟠 Major

Reject overlapping start() calls before rewriting session state.

The lock block does not check whether a session is already in progress (STARTING/STARTED/STOPPING). A second start() call can overwrite sessionId and activeStartCall while the first one is still active, leaving the earlier session's promise unsettled.

Suggested fix

         try {
             lock.lock();
+            if (state != ListeningState.IDLE) {
+                call.reject("Speech recognition is already running.");
+                return;
+            }
             cancelPendingForceStopLocked();
             forceStopped = false;

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In
`@android/src/main/java/app/capgo/speechrecognition/SpeechRecognitionPlugin.java`
around lines 186 - 207, Before mutating session state inside the locked block in
start() (the code that updates sessionId, activeStartCall, state, etc.), check
if state is already one of ListeningState.STARTING, STARTED, or STOPPING and
reject the new start call instead of proceeding; do this while holding lock to
avoid races. Concretely, in SpeechRecognitionPlugin.start() (the block that uses
lock.lock()/lock.unlock()), add an early guarded branch that if
state==STARTING||state==STARTED||state==STOPPING returns/throws a clear error to
the caller (or completes the incoming start promise/error via activeStartCall)
and does not modify sessionId or activeStartCall; only when state is idle should
you increment sessionId, assign currentSessionId, set activeStartCall, and set
state=ListeningState.STARTING. Ensure the rejection uses the same error-path
mechanism the plugin uses for other errors so the original session’s promise
remains intact.

🧹 Nitpick comments (1)

ios/Sources/SpeechRecognitionPlugin/SpeechAnalyzerRecognitionSession.swift (1)
415-431: Keep the fallback session surface in sync with the real one.

The 6.2 branch exposes isRunning, but this stub doesn’t. That kind of drift makes it easy to add a caller that works on newer toolchains and then breaks only on the fallback branch. I’d either add the missing members here now or extract a tiny shared protocol/base surface.
♻️ Minimal parity fix
 `@MainActor`
 final class SpeechAnalyzerRecognitionSession: NSObject {
     typealias ResultHandler = `@MainActor` ([String], Bool) -> Void
     typealias VoidHandler = `@MainActor` () -> Void
     typealias ErrorHandler = `@MainActor` (Error) -> Void
 
+    var isRunning: Bool { false }
+
     var onListeningStarted: VoidHandler?
     var onListeningStopped: VoidHandler?
     var onResult: ResultHandler?
     var onError: ErrorHandler?
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ios/Sources/SpeechRecognitionPlugin/SpeechAnalyzerRecognitionSession.swift`
around lines 415 - 431, Add a public isRunning Bool property to the
SpeechAnalyzerRecognitionSession stub so its surface matches the real session:
declare var isRunning: Bool = false in the class and ensure start() sets
isRunning = true (or true on successful start) and stop() sets isRunning =
false; reference SpeechAnalyzerRecognitionSession, start(), and stop() so
callers using isRunning on newer toolchains won't break the fallback
implementation.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@android/src/main/java/app/capgo/speechrecognition/SpeechRecognitionPlugin.java`:
- Around line 706-757: finishSession currently calls startCallToReject.reject()
while still holding lock, which risks deadlock; change the flow so you only
capture the PluginCall into a local (startCallToReject) while locked, perform
all state mutation and call lock.unlock() inside the handler, then after
unlocking (outside the locked section) call startCallToReject.reject(...). Refer
to finishSession, startCallToReject, lock.lock()/lock.unlock(), and the
reject(...) call and ensure any notifyListeners/emitListeningState calls that
must run under the lock remain inside while the reject(...) invocation happens
after the lock is released.
- Around line 1086-1104: The JSONArray comparison using
previousPartialResults.equals(nextPartialResults) is incorrect because JSONArray
doesn't override equals; update the change-detection to compare content (e.g.,
previousPartialResults.toString().equals(nextPartialResults.toString()) or use
previousPartialResults.similar(nextPartialResults)) so duplicate partial-result
events are suppressed; modify the block that builds nextPartialResults (variable
nextPartialResults) and the conditional that sets previousPartialResults and
payload in SpeechRecognitionPlugin (the code surrounding previousPartialResults,
nextPartialResults, payload, and the notifyListeners(PARTIAL_RESULTS_EVENT,
payload) call) to use a content-aware comparison and only update
previousPartialResults when the content differs.

---

Duplicate comments:
In
`@android/src/main/java/app/capgo/speechrecognition/SpeechRecognitionPlugin.java`:
- Around line 186-207: Before mutating session state inside the locked block in
start() (the code that updates sessionId, activeStartCall, state, etc.), check
if state is already one of ListeningState.STARTING, STARTED, or STOPPING and
reject the new start call instead of proceeding; do this while holding lock to
avoid races. Concretely, in SpeechRecognitionPlugin.start() (the block that uses
lock.lock()/lock.unlock()), add an early guarded branch that if
state==STARTING||state==STARTED||state==STOPPING returns/throws a clear error to
the caller (or completes the incoming start promise/error via activeStartCall)
and does not modify sessionId or activeStartCall; only when state is idle should
you increment sessionId, assign currentSessionId, set activeStartCall, and set
state=ListeningState.STARTING. Ensure the rejection uses the same error-path
mechanism the plugin uses for other errors so the original session’s promise
remains intact.

---

Nitpick comments:
In `@ios/Sources/SpeechRecognitionPlugin/SpeechAnalyzerRecognitionSession.swift`:
- Around line 415-431: Add a public isRunning Bool property to the
SpeechAnalyzerRecognitionSession stub so its surface matches the real session:
declare var isRunning: Bool = false in the class and ensure start() sets
isRunning = true (or true on successful start) and stop() sets isRunning =
false; reference SpeechAnalyzerRecognitionSession, start(), and stop() so
callers using isRunning on newer toolchains won't break the fallback
implementation.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 41d1d7d9-80d7-4e8c-af5c-bf2938f5b964

📥 Commits

Reviewing files that changed from the base of the PR and between 2c9a48a and dd5a4fb.

📒 Files selected for processing (3)

android/src/main/java/app/capgo/speechrecognition/SpeechRecognitionPlugin.java
ios/Sources/SpeechRecognitionPlugin/SpeechAnalyzerRecognitionSession.swift
src/definitions.ts

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

android/src/main/java/app/capgo/speechrecognition/SpeechRecognitionPlugin.java (2)

1108-1119: ⚠️ Potential issue | 🟠 Major

Apply isStale() to segmented-session callbacks too.

onSegmentResults() and onEndOfSegmentedSession() are the only listener callbacks still forwarding events without the session/generation guard. Late callbacks from a destroyed recognizer can leak segment events into the next session.

💡 Suggested fix

         `@Override`
         public void onSegmentResults(Bundle results) {
+            if (isStale()) {
+                return;
+            }
             ArrayList<String> matches = results.getStringArrayList(SpeechRecognizer.RESULTS_RECOGNITION);
             if (matches == null) {
                 return;
             }
             notifyListeners(SEGMENT_RESULTS_EVENT, new JSObject().put("matches", new JSArray(matches)));
         }

         `@Override`
         public void onEndOfSegmentedSession() {
+            if (isStale()) {
+                return;
+            }
             notifyListeners(END_OF_SEGMENT_EVENT, new JSObject());
         }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In
`@android/src/main/java/app/capgo/speechrecognition/SpeechRecognitionPlugin.java`
around lines 1108 - 1119, onSegmentResults and onEndOfSegmentedSession currently
forward segment events unguarded, which can let late callbacks from a destroyed
recognizer leak into a new session; update both methods to check the
session/generation guard by calling isStale() and returning early if true before
calling notifyListeners for SEGMENT_RESULTS_EVENT and END_OF_SEGMENT_EVENT so
only live sessions emit events.

576-620: ⚠️ Potential issue | 🔴 Critical

Guard support/model-download callbacks against stale sessions.

These async callbacks can arrive after stop(), forceStop(), or a recognizer rebuild, but they go straight into startInlineListening(...), call.reject(...), and emitErrorEvent(...) without re-checking staleness. Because sessionId is only advanced on the next start(), an old callback can still restart listening or surface an error after the user already stopped.

Also applies to: 623-664

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In
`@android/src/main/java/app/capgo/speechrecognition/SpeechRecognitionPlugin.java`
around lines 576 - 620, The async RecognitionSupportCallback handlers must guard
against stale sessions: capture the currentSessionId into a final/local variable
before calling speechRecognizer.checkRecognitionSupport(...) (or otherwise call
a helper like isSessionActive(sessionId)), and in both onSupportResult(...) and
onError(...) verify that the captured sessionId still equals the live
currentSessionId (or that isSessionActive returns true) before calling
startInlineListening, triggerOnDeviceModelDownload, call.reject, emitErrorEvent,
or finishSession; if stale, simply return without side effects. Apply the same
guard pattern to the analogous callbacks around the other block referenced (the
second callback range) so no old async result can restart listening or surface
errors for a stopped/rebuilt session.

♻️ Duplicate comments (2)

android/src/main/java/app/capgo/speechrecognition/SpeechRecognitionPlugin.java (2)

186-205: ⚠️ Potential issue | 🔴 Critical

Reject overlapping start() calls before rewriting session state.

This block still has no IDLE check. A second start() overwrites sessionId, state, and the cached request options while the first startup is still in flight, so the earlier beginListening(...) path goes stale and its promise can be orphaned.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In
`@android/src/main/java/app/capgo/speechrecognition/SpeechRecognitionPlugin.java`
around lines 186 - 205, Before mutating session state in the start() path, guard
against overlapping calls by checking the current listening state while holding
the same lock: inside the lock.lock() section (where you call
cancelPendingForceStopLocked(), set
forceStopped/pendingStopReason/resetPartialResultsCache(), and before
incrementing sessionId or setting state = ListeningState.STARTING), verify that
state == ListeningState.IDLE (or otherwise not already STARTING/ACTIVE) and
immediately reject/return the start request (e.g., fail the activeStartCall or
throw) if it is not IDLE; this prevents a second start() from overwriting
sessionId, state, cached options (lastLanguage/lastMaxResults/lastPrompt/etc.),
and orphaning the in-flight beginListening(...) promise. Ensure the check is
performed while holding the lock and before any assignment to sessionId, state,
or activeStartCall.

241-279: ⚠️ Potential issue | 🟠 Major

Don't emit terminal session events for popup mode until the activity returns.

Sessions launched with startActivityForResult(...) are still driven to readyForNextSession / stopped by the stop timers here. If the popup returns later, listeningResult(...) resolves or rejects an already-finished session and can emit a second terminal cycle.

Also applies to: 282-337

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In
`@android/src/main/java/app/capgo/speechrecognition/SpeechRecognitionPlugin.java`
around lines 241 - 279, The stop method is emitting terminal session events and
scheduling finish timers even when a popup activity was launched
(startActivityForResult), causing duplicate terminal cycles when the popup
returns; modify stop(PluginCall) and related logic (use the existing sessionId,
pendingStopReason, scheduleFinishFallbackLocked, emitListeningState,
listening(false), cancelPendingForceStopLocked) to detect an outstanding
activity-for-result/popup-return pending state (e.g., an
isAwaitingActivityResult or isPopupMode flag) and if that flag is set, do not
transition state to STOPPING, do not call
emitListeningState("stoppingListening", ...), and do not schedule the finish
fallback; instead set pendingStopReason and defer calling listening(false) and
scheduleFinishFallbackLocked until the activity result handler clears the
awaiting flag (where currently listeningResult(...) runs), at which point
perform the terminal transitions and resolve/reject the session once. Ensure
cancelPendingForceStopLocked still runs as appropriate but avoid emitting
terminal events while awaiting the activity result.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ios/Sources/SpeechRecognitionPlugin/SpeechRecognitionPlugin.swift`:
- Around line 569-576: The helper rejectPendingStartCallIfNeeded currently skips
rejecting when currentOptions?.partialResults == true, leaving the original JS
start() promise unresolved; change its logic so it always clears activeCall and
rejects the pending startCall (call startCall.reject(message) and set activeCall
= nil) regardless of partialResults, and ensure callers like stop(), forceStop()
(and any code paths that win during permission prompt/startup such as
beginRecognition) invoke rejectPendingStartCallIfNeeded(message:) so stale
sessions are settled.

In `@src/definitions.ts`:
- Around line 78-95: The interface SpeechRecognitionPartialResultEvent declares
matches as required but native forceStop() can emit payloads without matches;
update the type to reflect runtime by making matches optional (e.g., change
matches: string[] to matches?: string[]) and adjust any related JSDoc/comments
to note that matches may be undefined for forced/accumulated-only events so
callers should guard before using event.matches; ensure this change is made in
src/definitions.ts where the interface is declared.

---

Outside diff comments:
In
`@android/src/main/java/app/capgo/speechrecognition/SpeechRecognitionPlugin.java`:
- Around line 1108-1119: onSegmentResults and onEndOfSegmentedSession currently
forward segment events unguarded, which can let late callbacks from a destroyed
recognizer leak into a new session; update both methods to check the
session/generation guard by calling isStale() and returning early if true before
calling notifyListeners for SEGMENT_RESULTS_EVENT and END_OF_SEGMENT_EVENT so
only live sessions emit events.
- Around line 576-620: The async RecognitionSupportCallback handlers must guard
against stale sessions: capture the currentSessionId into a final/local variable
before calling speechRecognizer.checkRecognitionSupport(...) (or otherwise call
a helper like isSessionActive(sessionId)), and in both onSupportResult(...) and
onError(...) verify that the captured sessionId still equals the live
currentSessionId (or that isSessionActive returns true) before calling
startInlineListening, triggerOnDeviceModelDownload, call.reject, emitErrorEvent,
or finishSession; if stale, simply return without side effects. Apply the same
guard pattern to the analogous callbacks around the other block referenced (the
second callback range) so no old async result can restart listening or surface
errors for a stopped/rebuilt session.

---

Duplicate comments:
In
`@android/src/main/java/app/capgo/speechrecognition/SpeechRecognitionPlugin.java`:
- Around line 186-205: Before mutating session state in the start() path, guard
against overlapping calls by checking the current listening state while holding
the same lock: inside the lock.lock() section (where you call
cancelPendingForceStopLocked(), set
forceStopped/pendingStopReason/resetPartialResultsCache(), and before
incrementing sessionId or setting state = ListeningState.STARTING), verify that
state == ListeningState.IDLE (or otherwise not already STARTING/ACTIVE) and
immediately reject/return the start request (e.g., fail the activeStartCall or
throw) if it is not IDLE; this prevents a second start() from overwriting
sessionId, state, cached options (lastLanguage/lastMaxResults/lastPrompt/etc.),
and orphaning the in-flight beginListening(...) promise. Ensure the check is
performed while holding the lock and before any assignment to sessionId, state,
or activeStartCall.
- Around line 241-279: The stop method is emitting terminal session events and
scheduling finish timers even when a popup activity was launched
(startActivityForResult), causing duplicate terminal cycles when the popup
returns; modify stop(PluginCall) and related logic (use the existing sessionId,
pendingStopReason, scheduleFinishFallbackLocked, emitListeningState,
listening(false), cancelPendingForceStopLocked) to detect an outstanding
activity-for-result/popup-return pending state (e.g., an
isAwaitingActivityResult or isPopupMode flag) and if that flag is set, do not
transition state to STOPPING, do not call
emitListeningState("stoppingListening", ...), and do not schedule the finish
fallback; instead set pendingStopReason and defer calling listening(false) and
scheduleFinishFallbackLocked until the activity result handler clears the
awaiting flag (where currently listeningResult(...) runs), at which point
perform the terminal transitions and resolve/reject the session once. Ensure
cancelPendingForceStopLocked still runs as appropriate but avoid emitting
terminal events while awaiting the activity result.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b4ca23cb-dc50-437f-987a-a39b31019576

📥 Commits

Reviewing files that changed from the base of the PR and between dd5a4fb and 3db41a5.

📒 Files selected for processing (4)

README.md
android/src/main/java/app/capgo/speechrecognition/SpeechRecognitionPlugin.java
ios/Sources/SpeechRecognitionPlugin/SpeechRecognitionPlugin.swift
src/definitions.ts

coderabbitai

🧹 Nitpick comments (1)

ios/Sources/SpeechRecognitionPlugin/SpeechAnalyzerRecognitionSession.swift (1)

5-5: Please verify this gate against the Xcode/SDK matrix, not just the compiler version.

Line 5 uses #if compiler(>=6.2), which only keys off the Swift compiler version. SpeechAnalyzer/SpeechTranscriber are SDK-provided Speech framework symbols, so this is still only a proxy for “has the iOS 26 Speech APIs”; if you care about older/newer Xcode+SDK mixes, the modern branch can still be selected when the symbols you need are not. Since Lines 390-437 exist specifically for that compatibility story, I’d add an older-Xcode build to CI, or gate this at the build level, so the fallback path is actually exercised. (developer.apple.com)

Also applies to: 390-437

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 59cb8c39-1854-4017-85f7-5129cd231e33

📥 Commits

Reviewing files that changed from the base of the PR and between 3db41a5 and b305ccb.

📒 Files selected for processing (3)

android/src/main/java/app/capgo/speechrecognition/SpeechRecognitionPlugin.java
example-app/src/main.js
ios/Sources/SpeechRecognitionPlugin/SpeechAnalyzerRecognitionSession.swift

🚧 Files skipped from review as they are similar to previous changes (2)

example-app/src/main.js
android/src/main/java/app/capgo/speechrecognition/SpeechRecognitionPlugin.java

rmorteza and others added 21 commits December 23, 2025 15:04

refactor: improve speech recognizer management and error handling

424dc61

refactor: enhance thread safety in partial results handling

5026499

refactor: improve thread safety in speech recognition start method

0b94135

refactor: enhance error handling and synchronization in speech recogn…

7e18901

…izer reset

Merge branch 'Cap-go:main' into main

133b6c0

fix(example-app): add optional chaining for classList in appendEvent

7191f02

Text nodes don't have classList property, causing TypeError when firstChild is whitespace. Add optional chaining to prevent the error. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

fix(android): add null check for PluginCall in continuousPTT restart

d0d51e2

When continuousPTT mode auto-restarts after silence, beginListening is called with null PluginCall. Add null checks to prevent NPE crash. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

feat(build): add prepare script to automate build process

5acad22

Merge branch 'pr-2'

f59fe28

# Conflicts: # android/src/main/java/app/capgo/speechrecognition/SpeechRecognitionPlugin.java

Merge branch 'pr-4' into codex/merge-pr-4-6

d1ac57a

# Conflicts: # README.md # android/src/main/java/app/capgo/speechrecognition/SpeechRecognitionPlugin.java

Merge branch 'pr-6' into codex/merge-pr-4-6

2c9a48a

# Conflicts: # android/src/main/java/app/capgo/speechrecognition/SpeechRecognitionPlugin.java # src/definitions.ts

riderx added 2 commits March 24, 2026 13:14

style: run formatter

49f0e98

fix(ios): gate speech analyzer for older Xcode SDKs

dd5a4fb

coderabbitai Bot reviewed Mar 24, 2026

View reviewed changes

chore: remove implementation summary

70a9068

coderabbitai Bot reviewed Mar 24, 2026

View reviewed changes

Comment thread android/src/main/java/app/capgo/speechrecognition/SpeechRecognitionPlugin.java

Comment thread android/src/main/java/app/capgo/speechrecognition/SpeechRecognitionPlugin.java

riderx added 2 commits March 24, 2026 14:00

fix: address review feedback and add ios continuous ptt

3db41a5

fix: address remaining review threads

b305ccb

coderabbitai Bot reviewed Mar 24, 2026

View reviewed changes

Comment thread ios/Sources/SpeechRecognitionPlugin/SpeechRecognitionPlugin.swift

Comment thread src/definitions.ts

fix: settle stale ios starts and align partial result types

559132c

coderabbitai Bot reviewed Mar 24, 2026

View reviewed changes

riderx merged commit 2583bc6 into main Mar 24, 2026
6 checks passed

Uh oh!

Conversation

riderx commented Mar 24, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Notes

Verification

Summary by CodeRabbit

Uh oh!

chatgpt-codex-connector Bot commented Mar 24, 2026

Uh oh!

coderabbitai Bot commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

riderx commented Mar 24, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Mar 24, 2026 •

edited

Loading