Skip to content

feat: merge PRs #2, #4, and #6 with completed iOS support#7

Merged
riderx merged 27 commits into
mainfrom
codex/merge-pr-4-6
Mar 24, 2026
Merged

feat: merge PRs #2, #4, and #6 with completed iOS support#7
riderx merged 27 commits into
mainfrom
codex/merge-pr-4-6

Conversation

@riderx

@riderx riderx commented Mar 24, 2026

Copy link
Copy Markdown
Member

Summary

Notes

  • continuousPTT auto-restart remains Android-only
  • iOS now supports the same PTT-oriented API surface for hold/release flows, but not automatic silence restarts

Verification

  • bun run build
  • bun run verify:ios
  • bun run verify:android

Summary by CodeRabbit

  • New Features

    • Push-to-talk (PTT) hold-to-record with accumulated transcript display and experimental continuousPTT to keep sessions alive across silence
    • New public methods: forceStop(), getLastPartialResult(), setPTTState()
    • New events: error and readyForNextSession; richer listening and partial result payloads (sessionId, accumulated, forced, etc.)
  • Documentation

    • Updated docs for PTT flow, continuousPTT option, and streaming partialResults behavior until session end
  • Example App

    • Added PTT UI, controls, and styles for demo usage

rmorteza and others added 21 commits December 23, 2025 15:04
The Android SpeechRecognizer.stopListening() method doesn't reliably
stop audio input in all scenarios. This is problematic for Push-to-Talk
(PTT) implementations where the user expects recording to stop
immediately when releasing the button.

This commit adds:

1. forceStop() method with destroy/recreate pattern:
   - First tries graceful stopListening()
   - After configurable timeout (default 1500ms), forces stop by
     destroying and recreating the SpeechRecognizer
   - Returns the last cached partial result so no speech is lost
   - Emits 'stopped' state after force stop completes

2. getLastPartialResult() method:
   - Returns the most recent partial transcription
   - Useful for checking state or retrieving results after force stop

3. Callback guards:
   - onError, onResults, onPartialResults now check forceStopped flag
   - Prevents late callbacks from interfering after force stop
   - onResults cancels pending force-stop timeout when results arrive

Usage example:
```javascript
// Start with partial results for real-time updates
await SpeechRecognition.start({ partialResults: true });

// On button release, force stop with 1s timeout
await SpeechRecognition.forceStop({ timeout: 1000 });

// Get what was heard (in case normal results didn't fire)
const result = await SpeechRecognition.getLastPartialResult();
```

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…tal)

EXPERIMENTAL: This feature allows recognition to continue through silence
periods while the PTT button is held, useful for natural speech with pauses.

This commit adds:

1. continuousPTT option in start():
   - When enabled, recognition auto-restarts on silence/timeout errors
   - Results are accumulated across restarts
   - Requires setPTTState() to track button state

2. setPTTState() method:
   - Call with held=true on button press
   - Call with held=false on button release
   - Resets accumulated results when button is pressed

3. Auto-restart logic:
   - onError: Restarts on ERROR_NO_MATCH or ERROR_SPEECH_TIMEOUT
     while button is held
   - onResults: Accumulates result and restarts while button is held
   - Emits 'isRestarting' flag in partial results when restarting

4. Accumulated results:
   - Results from multiple recognition sessions are concatenated
   - Final result includes 'accumulatedText' with full transcription

Usage example:
```javascript
// Start with continuous PTT enabled
await SpeechRecognition.start({
  partialResults: true,
  continuousPTT: true
});

// On button press
await SpeechRecognition.setPTTState({ held: true });

// ... user speaks with pauses, recognition continues ...

// On button release
await SpeechRecognition.setPTTState({ held: false });
await SpeechRecognition.forceStop();
// Final result includes all accumulated speech
```

WARNING: This is experimental and may cause issues in some scenarios.
Use forceStop() without continuousPTT for reliable basic PTT.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Text nodes don't have classList property, causing TypeError when
firstChild is whitespace. Add optional chaining to prevent the error.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add interactive PTT demonstration to the example app:
- Toggle switch to enable PTT mode (hides regular Start/Stop buttons)
- Checkbox to enable continuousPTT (continue on silence, experimental)
- Press-and-hold microphone button with visual feedback
- Display accumulated results from continuous PTT sessions
- Add web.ts stubs for forceStop, getLastPartialResult, setPTTState
- Fix Kotlin version conflict in example-app Android build

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When continuousPTT mode auto-restarts after silence, beginListening is
called with null PluginCall. Add null checks to prevent NPE crash.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When user releases PTT (scheduling the force-stop timeout) and quickly
starts a new session, the old timeout could still fire and destroy the
fresh recognizer. This fix cancels any pending forceStopRunnable before
starting a new recognition session.

Addresses CodeRabbit review feedback.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add kotlin-gradle-plugin:1.9.24 to buildscript dependencies to match
the forced stdlib versions in resolutionStrategy. This ensures KGP
version consistency and avoids AGP/KGP version mismatches.

Addresses CodeRabbit review feedback.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Update JSDoc to accurately describe that forceStop() returns
Promise<void> and emits partial results via event. Users should call
getLastPartialResult() to retrieve transcription after force stop.

Addresses CodeRabbit review feedback.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When partialResults=false, start() returns a Promise that awaits
resolution from onResults()/onError(). If forceStop() timeout triggers
before normal completion, forceStopped=true caused both callbacks to
return early without resolving the pending PluginCall.

Now tracks activeStartCall and rejects it with "forceStop" error in
the timeout handler, preventing callers from hanging indefinitely.

Addresses CodeRabbit review feedback.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The TypeScript interface only defined `matches`, but the Android
implementation emits additional fields in PTT scenarios:
- accumulated: transcription across continuous PTT restarts
- accumulatedText: final accumulated text including current result
- isRestarting: true when restarting in continuous PTT mode
- forced: true when result emitted due to force stop timeout

Addresses CodeRabbit review feedback.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Mark forceStop(), getLastPartialResult(), and setPTTState() as
Android-only APIs since there are no corresponding iOS implementations.
This helps TypeScript users and documentation generators clearly
indicate these methods are unavailable on iOS.

Addresses CodeRabbit review feedback.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…nition

- Introduced a finite state machine to manage listening sessions, improving predictability and observability.
- Emitted events for state transitions: `startingListening`, `started`, `stoppingListening`, and `stopped`.
- Enhanced error handling by emitting an `error` event for all recognition errors.
- Added `readyForNextSession` event to signal when the recognizer is ready for a new session.
- Maintained backward compatibility while improving session management and UI responsiveness.
# Conflicts:
#	android/src/main/java/app/capgo/speechrecognition/SpeechRecognitionPlugin.java
# Conflicts:
#	README.md
#	android/src/main/java/app/capgo/speechrecognition/SpeechRecognitionPlugin.java
# Conflicts:
#	android/src/main/java/app/capgo/speechrecognition/SpeechRecognitionPlugin.java
#	src/definitions.ts
@chatgpt-codex-connector

Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, add credits to your account and enable them for code reviews in your settings.

@coderabbitai

coderabbitai Bot commented Mar 24, 2026

Copy link
Copy Markdown
📝 Walkthrough

Walkthrough

Added push-to-talk session management: three new public plugin APIs (setPTTState, forceStop, getLastPartialResult), session IDs and session-aware lifecycle across Android and iOS, continuous-PTT restart behavior, expanded event payloads (accumulated transcript, session metadata, error/ready events), and example app PTT UI.

Changes

Cohort / File(s) Summary
TypeScript API + Web
src/definitions.ts, src/web.ts
Added forceStop, getLastPartialResult, setPTTState to plugin API and types (ForceStopOptions, LastPartialResult, PTTStateOptions); added continuousPTT option; expanded partial result and listening event payloads; added error and readyForNextSession listener overloads. Web implementation provides stubs/safe default for new methods.
Android native
android/src/main/java/app/capgo/speechrecognition/...Constants.java, .../SpeechRecognitionPlugin.java
Incremented plugin version; added event name constants; rewrote plugin to session-aware lifecycle with monotonic sessionId, recognizerGeneration, ListeningState, locks/handlers; implemented forceStop, getLastPartialResult, setPTTState; continuous-PTT restart logic; structured ERROR_EVENT/READY_FOR_NEXT_SESSION_EVENT emissions; consolidated session finish/cleanup.
iOS native
ios/Sources/SpeechRecognitionPlugin/SpeechRecognitionPlugin.swift, ios/Sources/SpeechRecognitionPlugin/SpeechAnalyzerRecognitionSession.swift
Introduced session IDs, per-session state, ListeningReason; reworked start/stop to be session-aware; added forceStop, getLastPartialResult, setPTTState; gated callbacks by sessionId; centralized session finalization and continuous-PTT restart scheduling; added compiler-guarded stub branch for older compilers / iOS <26.
Example app (UI & logic)
example-app/index.html, example-app/src/main.js, example-app/src/style.css
Added PTT-mode UI (toggle, hold-to-record button, accumulated-results display, hidden PTT status); implemented PTT press/release flow, PTT state calls, start/forceStop orchestration, accumulated-results handling, and ready-for-next-session coordination; added styles for PTT controls.
Build / Config
package.json, example-app/android/build.gradle
Added prepare npm script to run build; added Kotlin Gradle plugin classpath and enforced Kotlin stdlib version 1.9.24 in example Android buildscript.
Docs
README.md
Added "Push-to-talk and session events" section documenting new APIs (setPTTState, forceStop, getLastPartialResult), continuousPTT behavior, expanded event payloads and new event types.

Sequence Diagram(s)

sequenceDiagram
    participant UI as Example App UI
    participant JS as Plugin JS
    participant Native as Native Plugin (Android/iOS)
    participant Recognizer as Native Recognizer

    rect rgba(200,230,255,0.5)
    UI->>JS: handlePTTPress()
    JS->>Native: setPTTState(held=true)
    JS->>Native: start(options + continuousPTT?)
    end

    rect rgba(200,255,200,0.5)
    Native->>Recognizer: create/start session(sessionId)
    Recognizer-->>Native: partial results (matches)
    Native-->>JS: partialResults (matches, accumulated?, sessionId)
    JS-->>UI: update accumulated display
    end

    rect rgba(255,230,200,0.5)
    UI->>JS: handlePTTRelease()
    JS->>Native: setPTTState(held=false)
    JS->>Native: forceStop(timeout)
    Native->>Recognizer: stop/teardown
    Native-->>JS: optional forced partial (forced=true)
    Native-->>JS: readyForNextSession(sessionId)
    JS-->>UI: fetch getLastPartialResult() / finalize UI
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 I press, I hold, the session hums along,
Words gather softly into one long song,
A forced little stop, a final tiny part,
Restarts keep listening — PTT warms my heart,
The rabbit nods: sessions now finish smart.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title clearly and accurately describes the main change: merging multiple PRs with completed iOS support for a speech recognition plugin.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codex/merge-pr-4-6

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
android/src/main/java/app/capgo/speechrecognition/SpeechRecognitionPlugin.java (2)

1110-1122: ⚠️ Potential issue | 🟠 Major

Apply the stale-listener guard to segmented callbacks too.

Late onSegmentResults() / onEndOfSegmentedSession() callbacks are still emitted after a session or recognizer-generation switch because only the other listener methods call isStale(). That leaks segment events from an old recognizer into the next session.

Suggested fix
         `@Override`
         public void onSegmentResults(Bundle results) {
+            if (isStale()) {
+                return;
+            }
             ArrayList<String> matches = results.getStringArrayList(SpeechRecognizer.RESULTS_RECOGNITION);
             if (matches == null) {
                 return;
             }
             notifyListeners(SEGMENT_RESULTS_EVENT, new JSObject().put("matches", new JSArray(matches)));
         }

         `@Override`
         public void onEndOfSegmentedSession() {
+            if (isStale()) {
+                return;
+            }
             notifyListeners(END_OF_SEGMENT_EVENT, new JSObject());
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@android/src/main/java/app/capgo/speechrecognition/SpeechRecognitionPlugin.java`
around lines 1110 - 1122, The segmented-callback handlers onSegmentResults and
onEndOfSegmentedSession need the same stale-listener guard used elsewhere: at
the start of onSegmentResults(Bundle results) call isStale() and return early if
true before processing matches/notifyListeners(SEGMENT_RESULTS_EVENT), and
likewise in onEndOfSegmentedSession() call isStale() and return early before
notifyListeners(END_OF_SEGMENT_EVENT); this prevents old recognizer/session
callbacks from leaking into a new session.

576-603: ⚠️ Potential issue | 🔴 Critical

Session-check the on-device support/model-download callbacks.

These callbacks call startInlineListening() / call.reject() without verifying that currentSessionId is still live. If the user stops or restarts while support checking or model download is in flight, a stale callback can revive a dead session or reject a newer flow.

Suggested guard pattern
                 `@Override`
                 public void onSupportResult(RecognitionSupport support) {
+                    try {
+                        lock.lock();
+                        if (currentSessionId != sessionId || pendingStopReason != null || state == ListeningState.IDLE) {
+                            return;
+                        }
+                    } finally {
+                        lock.unlock();
+                    }
                     boolean installed = isLanguageSupported(language, support.getInstalledOnDeviceLanguages());
                     boolean supported =
                         installed ||

Apply the same guard before acting in onError(...), onSuccess(), onScheduled(), and the model-download onError(...) path.

Also applies to: 605-614, 619-656

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@android/src/main/java/app/capgo/speechrecognition/SpeechRecognitionPlugin.java`
around lines 576 - 603, Guard all on-device support and model-download callbacks
by verifying the session is still active for the same currentSessionId before
taking actions; specifically, at the start of
RecognitionSupportCallback.onSupportResult and in the model-download callbacks
(onError, onSuccess, onScheduled) check that the sessionId matches an active
session (the same session tracking used by finishSession/currentSessionId) and
return early if it is stale, so that calls to startInlineListening(intent,...),
triggerOnDeviceModelDownload(...), call.reject(...), emitErrorEvent(...), and
finishSession(...) only run for live sessions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@android/src/main/java/app/capgo/speechrecognition/SpeechRecognitionPlugin.java`:
- Around line 241-279: stop()/forceStop() prematurely mark popup-mode sessions
finished even when an external activity started by startActivityForResult(...)
is still outstanding; update stop() and forceStop() to detect popup/modal
sessions started via startActivityForResult (check the same flag/field used when
launching the activity), cancel any scheduled finish fallbacks (call
cancelPendingForceStopLocked()), and instead defer emitting terminal events and
calling scheduleFinishFallbackLocked until the activity result returns;
additionally, when stopping you should attempt to finish/cancel the launched
activity (call Activity.finish() or use the saved PendingIntent/result handler)
so the popup doesn't outlive the session, and ensure listeningResult(...) only
resolves/rejects after the activity result is processed and
emitListeningState(..., "finished", ...) is only emitted after that.
- Around line 186-205: Before mutating session state in start(), check the
current state under the same lock and reject overlapping starts: if state is
ListeningState.STARTING, STARTED, or STOPPING, immediately fail/return the new
start call (e.g., complete its promise with an error) instead of proceeding;
only when state is not one of those values continue to increment sessionId, set
currentSessionId, assign activeStartCall, and set state =
ListeningState.STARTING. Ensure this check uses the same lock around the block
that touches sessionId, activeStartCall, and state so beginListening() and
subsequent code (e.g., beginListening()) will not have its promise orphaned by
an overwritten session.

In `@example-app/src/main.js`:
- Around line 96-104: In togglePTTMode, only enable the continuousPTT checkbox
on Android: detect platform (use the existing platform helper or add an
isAndroid check) and change the logic so refs.continuousPTT.disabled becomes
!(enabled && isAndroid) and refs.continuousPTT.checked is forced false when not
(enabled && isAndroid); update the control visibility/behavior in togglePTTMode
to honor this Android-only requirement while leaving other UI toggles unchanged.
- Around line 233-250: The PTT control only has mouse/touch handlers and lacks
keyboard support, so update the refs.pttButton wiring to also handle key events
and blur: add a keydown listener that calls handlePTTPress() when Space or Enter
is pressed (preventDefault to avoid scrolling on Space), a keyup listener that
calls handlePTTRelease() for those keys, and a blur listener that calls
handlePTTRelease() as a safety release; ensure existing touch/mouse handlers
(handlePTTPress, handlePTTRelease) are reused and keep
refs.pttMode/togglePTTMode unchanged.
- Around line 184-223: handlePTTPress and handlePTTRelease can interleave and
race; serialize them by adding a transition lock/token (e.g., pttTransitionId or
an async mutex) that each press/release captures and validates before proceeding
so stale async work is ignored. On entry to handlePTTPress and handlePTTRelease
acquire the token (or increment a session id), store it locally, and before any
async continuation (after await SpeechRecognition.setPTTState, await
SpeechRecognition.start, await SpeechRecognition.forceStop, await
SpeechRecognition.getLastPartialResult, or any other await) check the token
still matches; if not, abort that stale branch. Also ensure handlePTTPress does
not start a new session while a previous transition is in progress and only
allow a new press once forceStop or a readyForNextSession signal completes. Use
the existing symbols: handlePTTPress, handlePTTRelease,
SpeechRecognition.setPTTState, SpeechRecognition.start,
SpeechRecognition.forceStop, and SpeechRecognition.getLastPartialResult to
locate and instrument the logic.

In `@IMPLEMENTATION_SUMMARY.md`:
- Around line 464-489: The implementation summary still lists pending
README/CHANGELOG tasks and a sample CHANGELOG entry for 7.1.0 that conflict with
the merged docs and package.json (now 8.0.10); update IMPLEMENTATION_SUMMARY.md
to remove or mark as completed the README/CHANGELOG checklist items, either
update the sample changelog entry to the actual released version and date or
delete the placeholder 7.1.0 block, and add a short note that docs were
regenerated and package.json is at 8.0.10 so the summary reflects the current
release state; also verify references to README.md and CHANGELOG.md sections
mentioned (e.g., "error event", "listeningState") are present in the regenerated
docs and adjust the summary text accordingly.
- Around line 103-105: The fenced code blocks containing state diagrams (for
example the block showing "IDLE → (start()) → STARTING → STARTED → (stop() or
results or error) → STOPPING → IDLE" and the other similar blocks referenced)
are missing language identifiers and failing markdownlint; update each of those
fenced blocks to include an appropriate language tag (e.g., ```text or
```markdown or ```bash as appropriate) so the linter recognizes them—apply this
change to the shown block and the other occurrences at the ranges called out
(292-298, 301-308, 311-317, 320-327, 333-340).

In `@ios/Sources/SpeechRecognitionPlugin/SpeechRecognitionPlugin.swift`:
- Around line 488-504: The code emits readyForNextSession and stopped before an
async SpeechAnalyzerRecognitionSession.finish completes, allowing a new start()
to race with teardown; change the flow in stopCurrentSession so that if
`#available`(iOS 26.0, *) and modernRecognitionSession is a
SpeechAnalyzerRecognitionSession and sessionAlreadyStopped is false, you await
modernSession.stop() (on the MainActor) before clearing
modernRecognitionSession, calling clearLegacyRecognitionResources(), and before
calling notifyListeners("readyForNextSession", ...) and emitListeningState(...);
in other words, move the async stop-await path to block emitting/clearing until
after await modernSession.stop(), only then set modernRecognitionSession = nil
and proceed with notifyListeners and emitListeningState (if
sessionAlreadyStopped is true you can keep the existing immediate behavior).
- Around line 113-124: The permission-denied callback inside
AVAudioSession.sharedInstance().requestRecordPermission must first verify the
sessionId matches the current active session to avoid emitting errors for
finished sessions; inside the requestRecordPermission closure (before calling
DispatchQueue.main.async and before
emitErrorEvent/activeCall/finishSessionIfNeeded) add a guard that sessionId ==
self.activeSessionId (or otherwise confirm the session is still active) and
return early if it does not match so stop()/forceStop() won't cause stale
MICROPHONE_PERMISSION_DENIED emissions.

In `@src/definitions.ts`:
- Around line 323-333: The multi-argument overload for addListener with
eventName 'readyForNextSession' is not reflowed and fails prettier; reformat
that overload signature (the addListener declaration taking eventName:
'readyForNextSession' and listenerFunc: (event: SpeechRecognitionReadyEvent) =>
void) so its parameters are wrapped across lines like the other multi-argument
overloads (each parameter on its own line, preserve the trailing comma and
return type Promise<PluginListenerHandle>) to satisfy prettier.

---

Outside diff comments:
In
`@android/src/main/java/app/capgo/speechrecognition/SpeechRecognitionPlugin.java`:
- Around line 1110-1122: The segmented-callback handlers onSegmentResults and
onEndOfSegmentedSession need the same stale-listener guard used elsewhere: at
the start of onSegmentResults(Bundle results) call isStale() and return early if
true before processing matches/notifyListeners(SEGMENT_RESULTS_EVENT), and
likewise in onEndOfSegmentedSession() call isStale() and return early before
notifyListeners(END_OF_SEGMENT_EVENT); this prevents old recognizer/session
callbacks from leaking into a new session.
- Around line 576-603: Guard all on-device support and model-download callbacks
by verifying the session is still active for the same currentSessionId before
taking actions; specifically, at the start of
RecognitionSupportCallback.onSupportResult and in the model-download callbacks
(onError, onSuccess, onScheduled) check that the sessionId matches an active
session (the same session tracking used by finishSession/currentSessionId) and
return early if it is stale, so that calls to startInlineListening(intent,...),
triggerOnDeviceModelDownload(...), call.reject(...), emitErrorEvent(...), and
finishSession(...) only run for live sessions.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 090a4222-73a6-4a76-a5fe-e4571831fa26

📥 Commits

Reviewing files that changed from the base of the PR and between 877e39a and 2c9a48a.

📒 Files selected for processing (12)
  • IMPLEMENTATION_SUMMARY.md
  • README.md
  • android/src/main/java/app/capgo/speechrecognition/Constants.java
  • android/src/main/java/app/capgo/speechrecognition/SpeechRecognitionPlugin.java
  • example-app/android/build.gradle
  • example-app/index.html
  • example-app/src/main.js
  • example-app/src/style.css
  • ios/Sources/SpeechRecognitionPlugin/SpeechRecognitionPlugin.swift
  • package.json
  • src/definitions.ts
  • src/web.ts

Comment thread example-app/src/main.js
Comment thread example-app/src/main.js
Comment thread example-app/src/main.js
Comment thread IMPLEMENTATION_SUMMARY.md Outdated
Comment thread IMPLEMENTATION_SUMMARY.md Outdated
Comment thread ios/Sources/SpeechRecognitionPlugin/SpeechRecognitionPlugin.swift
Comment thread ios/Sources/SpeechRecognitionPlugin/SpeechRecognitionPlugin.swift Outdated
Comment thread src/definitions.ts

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
android/src/main/java/app/capgo/speechrecognition/SpeechRecognitionPlugin.java (1)

186-207: ⚠️ Potential issue | 🟠 Major

Reject overlapping start() calls before rewriting session state.

The lock block does not check whether a session is already in progress (STARTING/STARTED/STOPPING). A second start() call can overwrite sessionId and activeStartCall while the first one is still active, leaving the earlier session's promise unsettled.

Suggested fix
         try {
             lock.lock();
+            if (state != ListeningState.IDLE) {
+                call.reject("Speech recognition is already running.");
+                return;
+            }
             cancelPendingForceStopLocked();
             forceStopped = false;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@android/src/main/java/app/capgo/speechrecognition/SpeechRecognitionPlugin.java`
around lines 186 - 207, Before mutating session state inside the locked block in
start() (the code that updates sessionId, activeStartCall, state, etc.), check
if state is already one of ListeningState.STARTING, STARTED, or STOPPING and
reject the new start call instead of proceeding; do this while holding lock to
avoid races. Concretely, in SpeechRecognitionPlugin.start() (the block that uses
lock.lock()/lock.unlock()), add an early guarded branch that if
state==STARTING||state==STARTED||state==STOPPING returns/throws a clear error to
the caller (or completes the incoming start promise/error via activeStartCall)
and does not modify sessionId or activeStartCall; only when state is idle should
you increment sessionId, assign currentSessionId, set activeStartCall, and set
state=ListeningState.STARTING. Ensure the rejection uses the same error-path
mechanism the plugin uses for other errors so the original session’s promise
remains intact.
🧹 Nitpick comments (1)
ios/Sources/SpeechRecognitionPlugin/SpeechAnalyzerRecognitionSession.swift (1)

415-431: Keep the fallback session surface in sync with the real one.

The 6.2 branch exposes isRunning, but this stub doesn’t. That kind of drift makes it easy to add a caller that works on newer toolchains and then breaks only on the fallback branch. I’d either add the missing members here now or extract a tiny shared protocol/base surface.

♻️ Minimal parity fix
 `@MainActor`
 final class SpeechAnalyzerRecognitionSession: NSObject {
     typealias ResultHandler = `@MainActor` ([String], Bool) -> Void
     typealias VoidHandler = `@MainActor` () -> Void
     typealias ErrorHandler = `@MainActor` (Error) -> Void
 
+    var isRunning: Bool { false }
+
     var onListeningStarted: VoidHandler?
     var onListeningStopped: VoidHandler?
     var onResult: ResultHandler?
     var onError: ErrorHandler?
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ios/Sources/SpeechRecognitionPlugin/SpeechAnalyzerRecognitionSession.swift`
around lines 415 - 431, Add a public isRunning Bool property to the
SpeechAnalyzerRecognitionSession stub so its surface matches the real session:
declare var isRunning: Bool = false in the class and ensure start() sets
isRunning = true (or true on successful start) and stop() sets isRunning =
false; reference SpeechAnalyzerRecognitionSession, start(), and stop() so
callers using isRunning on newer toolchains won't break the fallback
implementation.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@android/src/main/java/app/capgo/speechrecognition/SpeechRecognitionPlugin.java`:
- Around line 706-757: finishSession currently calls startCallToReject.reject()
while still holding lock, which risks deadlock; change the flow so you only
capture the PluginCall into a local (startCallToReject) while locked, perform
all state mutation and call lock.unlock() inside the handler, then after
unlocking (outside the locked section) call startCallToReject.reject(...). Refer
to finishSession, startCallToReject, lock.lock()/lock.unlock(), and the
reject(...) call and ensure any notifyListeners/emitListeningState calls that
must run under the lock remain inside while the reject(...) invocation happens
after the lock is released.
- Around line 1086-1104: The JSONArray comparison using
previousPartialResults.equals(nextPartialResults) is incorrect because JSONArray
doesn't override equals; update the change-detection to compare content (e.g.,
previousPartialResults.toString().equals(nextPartialResults.toString()) or use
previousPartialResults.similar(nextPartialResults)) so duplicate partial-result
events are suppressed; modify the block that builds nextPartialResults (variable
nextPartialResults) and the conditional that sets previousPartialResults and
payload in SpeechRecognitionPlugin (the code surrounding previousPartialResults,
nextPartialResults, payload, and the notifyListeners(PARTIAL_RESULTS_EVENT,
payload) call) to use a content-aware comparison and only update
previousPartialResults when the content differs.

---

Duplicate comments:
In
`@android/src/main/java/app/capgo/speechrecognition/SpeechRecognitionPlugin.java`:
- Around line 186-207: Before mutating session state inside the locked block in
start() (the code that updates sessionId, activeStartCall, state, etc.), check
if state is already one of ListeningState.STARTING, STARTED, or STOPPING and
reject the new start call instead of proceeding; do this while holding lock to
avoid races. Concretely, in SpeechRecognitionPlugin.start() (the block that uses
lock.lock()/lock.unlock()), add an early guarded branch that if
state==STARTING||state==STARTED||state==STOPPING returns/throws a clear error to
the caller (or completes the incoming start promise/error via activeStartCall)
and does not modify sessionId or activeStartCall; only when state is idle should
you increment sessionId, assign currentSessionId, set activeStartCall, and set
state=ListeningState.STARTING. Ensure the rejection uses the same error-path
mechanism the plugin uses for other errors so the original session’s promise
remains intact.

---

Nitpick comments:
In `@ios/Sources/SpeechRecognitionPlugin/SpeechAnalyzerRecognitionSession.swift`:
- Around line 415-431: Add a public isRunning Bool property to the
SpeechAnalyzerRecognitionSession stub so its surface matches the real session:
declare var isRunning: Bool = false in the class and ensure start() sets
isRunning = true (or true on successful start) and stop() sets isRunning =
false; reference SpeechAnalyzerRecognitionSession, start(), and stop() so
callers using isRunning on newer toolchains won't break the fallback
implementation.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 41d1d7d9-80d7-4e8c-af5c-bf2938f5b964

📥 Commits

Reviewing files that changed from the base of the PR and between 2c9a48a and dd5a4fb.

📒 Files selected for processing (3)
  • android/src/main/java/app/capgo/speechrecognition/SpeechRecognitionPlugin.java
  • ios/Sources/SpeechRecognitionPlugin/SpeechAnalyzerRecognitionSession.swift
  • src/definitions.ts

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
android/src/main/java/app/capgo/speechrecognition/SpeechRecognitionPlugin.java (2)

1108-1119: ⚠️ Potential issue | 🟠 Major

Apply isStale() to segmented-session callbacks too.

onSegmentResults() and onEndOfSegmentedSession() are the only listener callbacks still forwarding events without the session/generation guard. Late callbacks from a destroyed recognizer can leak segment events into the next session.

💡 Suggested fix
         `@Override`
         public void onSegmentResults(Bundle results) {
+            if (isStale()) {
+                return;
+            }
             ArrayList<String> matches = results.getStringArrayList(SpeechRecognizer.RESULTS_RECOGNITION);
             if (matches == null) {
                 return;
             }
             notifyListeners(SEGMENT_RESULTS_EVENT, new JSObject().put("matches", new JSArray(matches)));
         }

         `@Override`
         public void onEndOfSegmentedSession() {
+            if (isStale()) {
+                return;
+            }
             notifyListeners(END_OF_SEGMENT_EVENT, new JSObject());
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@android/src/main/java/app/capgo/speechrecognition/SpeechRecognitionPlugin.java`
around lines 1108 - 1119, onSegmentResults and onEndOfSegmentedSession currently
forward segment events unguarded, which can let late callbacks from a destroyed
recognizer leak into a new session; update both methods to check the
session/generation guard by calling isStale() and returning early if true before
calling notifyListeners for SEGMENT_RESULTS_EVENT and END_OF_SEGMENT_EVENT so
only live sessions emit events.

576-620: ⚠️ Potential issue | 🔴 Critical

Guard support/model-download callbacks against stale sessions.

These async callbacks can arrive after stop(), forceStop(), or a recognizer rebuild, but they go straight into startInlineListening(...), call.reject(...), and emitErrorEvent(...) without re-checking staleness. Because sessionId is only advanced on the next start(), an old callback can still restart listening or surface an error after the user already stopped.

Also applies to: 623-664

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@android/src/main/java/app/capgo/speechrecognition/SpeechRecognitionPlugin.java`
around lines 576 - 620, The async RecognitionSupportCallback handlers must guard
against stale sessions: capture the currentSessionId into a final/local variable
before calling speechRecognizer.checkRecognitionSupport(...) (or otherwise call
a helper like isSessionActive(sessionId)), and in both onSupportResult(...) and
onError(...) verify that the captured sessionId still equals the live
currentSessionId (or that isSessionActive returns true) before calling
startInlineListening, triggerOnDeviceModelDownload, call.reject, emitErrorEvent,
or finishSession; if stale, simply return without side effects. Apply the same
guard pattern to the analogous callbacks around the other block referenced (the
second callback range) so no old async result can restart listening or surface
errors for a stopped/rebuilt session.
♻️ Duplicate comments (2)
android/src/main/java/app/capgo/speechrecognition/SpeechRecognitionPlugin.java (2)

186-205: ⚠️ Potential issue | 🔴 Critical

Reject overlapping start() calls before rewriting session state.

This block still has no IDLE check. A second start() overwrites sessionId, state, and the cached request options while the first startup is still in flight, so the earlier beginListening(...) path goes stale and its promise can be orphaned.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@android/src/main/java/app/capgo/speechrecognition/SpeechRecognitionPlugin.java`
around lines 186 - 205, Before mutating session state in the start() path, guard
against overlapping calls by checking the current listening state while holding
the same lock: inside the lock.lock() section (where you call
cancelPendingForceStopLocked(), set
forceStopped/pendingStopReason/resetPartialResultsCache(), and before
incrementing sessionId or setting state = ListeningState.STARTING), verify that
state == ListeningState.IDLE (or otherwise not already STARTING/ACTIVE) and
immediately reject/return the start request (e.g., fail the activeStartCall or
throw) if it is not IDLE; this prevents a second start() from overwriting
sessionId, state, cached options (lastLanguage/lastMaxResults/lastPrompt/etc.),
and orphaning the in-flight beginListening(...) promise. Ensure the check is
performed while holding the lock and before any assignment to sessionId, state,
or activeStartCall.

241-279: ⚠️ Potential issue | 🟠 Major

Don't emit terminal session events for popup mode until the activity returns.

Sessions launched with startActivityForResult(...) are still driven to readyForNextSession / stopped by the stop timers here. If the popup returns later, listeningResult(...) resolves or rejects an already-finished session and can emit a second terminal cycle.

Also applies to: 282-337

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@android/src/main/java/app/capgo/speechrecognition/SpeechRecognitionPlugin.java`
around lines 241 - 279, The stop method is emitting terminal session events and
scheduling finish timers even when a popup activity was launched
(startActivityForResult), causing duplicate terminal cycles when the popup
returns; modify stop(PluginCall) and related logic (use the existing sessionId,
pendingStopReason, scheduleFinishFallbackLocked, emitListeningState,
listening(false), cancelPendingForceStopLocked) to detect an outstanding
activity-for-result/popup-return pending state (e.g., an
isAwaitingActivityResult or isPopupMode flag) and if that flag is set, do not
transition state to STOPPING, do not call
emitListeningState("stoppingListening", ...), and do not schedule the finish
fallback; instead set pendingStopReason and defer calling listening(false) and
scheduleFinishFallbackLocked until the activity result handler clears the
awaiting flag (where currently listeningResult(...) runs), at which point
perform the terminal transitions and resolve/reject the session once. Ensure
cancelPendingForceStopLocked still runs as appropriate but avoid emitting
terminal events while awaiting the activity result.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ios/Sources/SpeechRecognitionPlugin/SpeechRecognitionPlugin.swift`:
- Around line 569-576: The helper rejectPendingStartCallIfNeeded currently skips
rejecting when currentOptions?.partialResults == true, leaving the original JS
start() promise unresolved; change its logic so it always clears activeCall and
rejects the pending startCall (call startCall.reject(message) and set activeCall
= nil) regardless of partialResults, and ensure callers like stop(), forceStop()
(and any code paths that win during permission prompt/startup such as
beginRecognition) invoke rejectPendingStartCallIfNeeded(message:) so stale
sessions are settled.

In `@src/definitions.ts`:
- Around line 78-95: The interface SpeechRecognitionPartialResultEvent declares
matches as required but native forceStop() can emit payloads without matches;
update the type to reflect runtime by making matches optional (e.g., change
matches: string[] to matches?: string[]) and adjust any related JSDoc/comments
to note that matches may be undefined for forced/accumulated-only events so
callers should guard before using event.matches; ensure this change is made in
src/definitions.ts where the interface is declared.

---

Outside diff comments:
In
`@android/src/main/java/app/capgo/speechrecognition/SpeechRecognitionPlugin.java`:
- Around line 1108-1119: onSegmentResults and onEndOfSegmentedSession currently
forward segment events unguarded, which can let late callbacks from a destroyed
recognizer leak into a new session; update both methods to check the
session/generation guard by calling isStale() and returning early if true before
calling notifyListeners for SEGMENT_RESULTS_EVENT and END_OF_SEGMENT_EVENT so
only live sessions emit events.
- Around line 576-620: The async RecognitionSupportCallback handlers must guard
against stale sessions: capture the currentSessionId into a final/local variable
before calling speechRecognizer.checkRecognitionSupport(...) (or otherwise call
a helper like isSessionActive(sessionId)), and in both onSupportResult(...) and
onError(...) verify that the captured sessionId still equals the live
currentSessionId (or that isSessionActive returns true) before calling
startInlineListening, triggerOnDeviceModelDownload, call.reject, emitErrorEvent,
or finishSession; if stale, simply return without side effects. Apply the same
guard pattern to the analogous callbacks around the other block referenced (the
second callback range) so no old async result can restart listening or surface
errors for a stopped/rebuilt session.

---

Duplicate comments:
In
`@android/src/main/java/app/capgo/speechrecognition/SpeechRecognitionPlugin.java`:
- Around line 186-205: Before mutating session state in the start() path, guard
against overlapping calls by checking the current listening state while holding
the same lock: inside the lock.lock() section (where you call
cancelPendingForceStopLocked(), set
forceStopped/pendingStopReason/resetPartialResultsCache(), and before
incrementing sessionId or setting state = ListeningState.STARTING), verify that
state == ListeningState.IDLE (or otherwise not already STARTING/ACTIVE) and
immediately reject/return the start request (e.g., fail the activeStartCall or
throw) if it is not IDLE; this prevents a second start() from overwriting
sessionId, state, cached options (lastLanguage/lastMaxResults/lastPrompt/etc.),
and orphaning the in-flight beginListening(...) promise. Ensure the check is
performed while holding the lock and before any assignment to sessionId, state,
or activeStartCall.
- Around line 241-279: The stop method is emitting terminal session events and
scheduling finish timers even when a popup activity was launched
(startActivityForResult), causing duplicate terminal cycles when the popup
returns; modify stop(PluginCall) and related logic (use the existing sessionId,
pendingStopReason, scheduleFinishFallbackLocked, emitListeningState,
listening(false), cancelPendingForceStopLocked) to detect an outstanding
activity-for-result/popup-return pending state (e.g., an
isAwaitingActivityResult or isPopupMode flag) and if that flag is set, do not
transition state to STOPPING, do not call
emitListeningState("stoppingListening", ...), and do not schedule the finish
fallback; instead set pendingStopReason and defer calling listening(false) and
scheduleFinishFallbackLocked until the activity result handler clears the
awaiting flag (where currently listeningResult(...) runs), at which point
perform the terminal transitions and resolve/reject the session once. Ensure
cancelPendingForceStopLocked still runs as appropriate but avoid emitting
terminal events while awaiting the activity result.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b4ca23cb-dc50-437f-987a-a39b31019576

📥 Commits

Reviewing files that changed from the base of the PR and between dd5a4fb and 3db41a5.

📒 Files selected for processing (4)
  • README.md
  • android/src/main/java/app/capgo/speechrecognition/SpeechRecognitionPlugin.java
  • ios/Sources/SpeechRecognitionPlugin/SpeechRecognitionPlugin.swift
  • src/definitions.ts

Comment thread ios/Sources/SpeechRecognitionPlugin/SpeechRecognitionPlugin.swift
Comment thread src/definitions.ts

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
ios/Sources/SpeechRecognitionPlugin/SpeechAnalyzerRecognitionSession.swift (1)

5-5: Please verify this gate against the Xcode/SDK matrix, not just the compiler version.

Line 5 uses #if compiler(>=6.2), which only keys off the Swift compiler version. SpeechAnalyzer/SpeechTranscriber are SDK-provided Speech framework symbols, so this is still only a proxy for “has the iOS 26 Speech APIs”; if you care about older/newer Xcode+SDK mixes, the modern branch can still be selected when the symbols you need are not. Since Lines 390-437 exist specifically for that compatibility story, I’d add an older-Xcode build to CI, or gate this at the build level, so the fallback path is actually exercised. (developer.apple.com)

Also applies to: 390-437


ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 59cb8c39-1854-4017-85f7-5129cd231e33

📥 Commits

Reviewing files that changed from the base of the PR and between 3db41a5 and b305ccb.

📒 Files selected for processing (3)
  • android/src/main/java/app/capgo/speechrecognition/SpeechRecognitionPlugin.java
  • example-app/src/main.js
  • ios/Sources/SpeechRecognitionPlugin/SpeechAnalyzerRecognitionSession.swift
🚧 Files skipped from review as they are similar to previous changes (2)
  • example-app/src/main.js
  • android/src/main/java/app/capgo/speechrecognition/SpeechRecognitionPlugin.java

@riderx riderx merged commit 2583bc6 into main Mar 24, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants