Feature/speech to text by g4rb4g3 · Pull Request #82 · Iternio-Planning-AB/react-native-auto-play

g4rb4g3 · 2026-05-15T11:48:29Z

This PR adds speech to text support

SamuelBrucksch · 2026-05-19T11:13:33Z

@claude please review this PR

claude · 2026-05-19T11:13:50Z

Claude finished @SamuelBrucksch's task in 2m 43s —— View job

PR Review: Feature/speech-to-text

Gather context and understand the PR
Review new TypeScript specs and types
Review new HybridVoice module (TS wrapper)
Review iOS native implementation changes
Review Android native implementation changes
Review example app changes
Review generated nitrogen code (spot check)
Post comprehensive review

Summary

This PR extracts voice input functionality from HybridAutoPlay into a dedicated HybridVoice NitroModule, and adds speech-to-text (STT) support alongside the existing raw PCM recording. The new API supports streaming partial results via an onChunk callback, and falls back to PCM when STT is unavailable. This is a well-structured refactor with a clean separation of concerns.

Architecture & Design — Looks Good

Clean module split: Voice functionality is properly extracted into its own Voice NitroModule with a dedicated spec, hybrid wrapper, and native implementations on both platforms. This is a better separation than having it embedded in AutoPlay.
Fallback strategy is solid: Both platforms gracefully fall back from STT → PCM when speech recognition is unavailable (missing authorization, recognizer not available, API level too low).
Streaming API design: The onChunk callback with VoiceInputChunk (containing either partial transcription or audio data) is a clean abstraction. The TypeScript overloads in HybridVoice.ts that distinguish the return type based on whether onChunk is provided are well-typed.

Issues & Suggestions

1. iOS: Thread safety of `samples` array in audio tap (Medium)

File: packages/react-native-autoplay/ios/utils/VoiceInputManager.swift

The samples array is mutated from two different threads:

Audio tap callback (real-time audio thread, line 233): self.samples.append(contentsOf: newSamples)
stop() method (via stopLock, line 107): reads and clears samples

While stop() properly uses stopLock to synchronize its read of samples, the audio tap callback writes to samples without acquiring stopLock. The isStopping flag check (line 208) may provide some implicit ordering, but there's a TOCTOU window: the tap could check isStopping == false, then stop() captures samples, then the tap appends — potentially causing a data race on Swift's Array (which is not thread-safe).

Similarly, recognitionTask's error handler (line 161-162) reads and clears samples under stopLock, but the audio tap could be appending concurrently.

Suggestion: Either protect samples.append with stopLock, or switch to a lock-free approach with a concurrent queue.

Fix this →

2. iOS: `hasVoiceInputPermission` now requires Speech authorization — breaking change? (Medium)

File: packages/react-native-autoplay/ios/hybrid/HybridVoice.swift, lines 8-11

The old hasVoiceInputPermission() only checked microphone permission. The new implementation checks both microphone AND speech recognition:

let micGranted = AVAudioSession.sharedInstance().recordPermission == .granted
let speechGranted = SFSpeechRecognizer.authorizationStatus() == .authorized
return micGranted && speechGranted

This means hasVoiceInputPermission() will return false for users who previously had only microphone permission, even if they only want PCM recording (not STT). Callers using preferSpeechToText: false still need to pass the permission check.

Suggestion: Consider either:

Only requiring speech authorization when the caller intends to use STT, or
Documenting this as a known behavioral change and requiring consumers to update their permission flow.

Fix this →

3. iOS: STT error handler may double-cleanup (Low)

File: packages/react-native-autoplay/ios/utils/VoiceInputManager.swift, lines 155-168

When the recognitionTask error callback fires, it calls self.cleanup(interfaceController:) which tears down the audio engine and calls recognitionRequest = nil. However, the audio engine tap might still be running at the moment cleanup is entered. The ResultBox protects against double-resume, which is good — but the engine could still deliver a buffer between the error callback's cleanup and the tap actually being removed. This is benign (the tap's guard !self.isStopping will bail), but the isStopping flag is not set in the error path.

Suggestion: Set self.isStopping = true in the error handler before calling cleanup, similar to how the isFinal path does it (line 173).

Fix this →

4. Android: `startSTT` error returns empty result instead of throwing (Low)

File: packages/react-native-autoplay/android/.../VoiceInputManager.kt, lines 108-112

When SpeechRecognizer.onError fires, the code silently returns an empty VoiceInputResult(transcription = null, audio = null) instead of indicating failure. This means callers can't distinguish "no speech detected" from "recognition service failed."

override fun onError(error: Int) {
    activeSpeechRecognizer = null
    recognizer.destroy()
    cont.resume(VoiceInputResult(transcription = null, audio = null))
}

The same pattern exists in startSTTWithSource (line 219). This is arguably a design choice, but consumers might want to know the difference.

Suggestion: Consider at least logging the error code, or adding an optional error field to VoiceInputResult.

5. Android: `recordPCM` uses `suspendCancellableCoroutine` but doesn't handle cancellation cleanly (Low)

File: packages/react-native-autoplay/android/.../VoiceInputManager.kt, line 288

The coroutine uses suspendCancellableCoroutine but doesn't register an invokeOnCancellation handler for the PCM recording path. If the coroutine is cancelled externally, releaseResources() won't be called until the finally block in the recording loop runs — but the loop checks isRecording which won't be set to false by cancellation.

Suggestion: Add cont.invokeOnCancellation { stop() } after setting up the recording, to mirror the cancellation handling in the STT paths.

6. iOS: `HybridVoice.voiceInputManager` is instance-level, not static (Low)

File: packages/react-native-autoplay/ios/hybrid/HybridVoice.swift, line 6

The old code stored voiceInputManager as a static property on HybridAutoPlay. The new HybridVoice stores it as an instance property. If NitroModules creates a new HybridVoice instance per JS call to createHybridObject, then calling stopVoiceInput() on a different instance than the one that started recording would be a no-op (since voiceInputManager would be nil).

This seems fine since the JS side caches the hybrid object in _native (line 5 of HybridVoice.ts), but worth verifying that NitroModules reuses the same native instance.

7. `VoiceInputOptions` type is defined but only used in the TS wrapper (Nit)

File: packages/react-native-autoplay/src/types/Voice.ts, lines 11-17

VoiceInputOptions is a nice TS-side convenience type exported from index.ts. This is fine — just noting that the nitro spec uses positional params while the wrapper uses the options pattern, which is the right call for ergonomics.

What Looks Good

ResultBox pattern (iOS): The lock-protected continuation wrapper is a solid approach to preventing double-resume from concurrent stop/recognition callbacks.
startSTTFromCarAudio pipe approach (Android): Feeding CarAudioRecord PCM through a pipe to SpeechRecognizer via EXTRA_AUDIO_SOURCE (API 33+) is clever and well-implemented, with proper cleanup in the finally block.
ThreadUtil simplification (Android): The refactor from Handler(Looper.getMainLooper()) to withContext(Dispatchers.Main) is cleaner.
Backward compatibility preserved on AutoPlay spec: addListenerVoiceInput remains on the AutoPlay spec (as an Android-only OS-triggered listener), while the new active voice API lives on Voice. Good separation.
Example app updated: Both iOS and Android header buttons now use HybridVoice with preferSpeechToText: true, exercising the new API.

Breaking Changes to Note

This PR is a breaking change for consumers who used the old voice API on HybridAutoPlay:

HybridAutoPlay.hasVoiceInputPermission(), requestVoiceInputPermission(), startVoiceInput(), and stopVoiceInput() have been removed
Consumers must switch to HybridVoice.* (newly exported from the package)
The return type of startVoiceInput changed from Promise<ArrayBuffer> to Promise<VoiceInputResult> (containing either transcription or audio)
hasVoiceInputPermission on iOS now requires speech recognition authorization in addition to microphone permission

Consider documenting migration steps in the release notes.

…ith VoiceInputResult

SamuelBrucksch · 2026-06-05T14:53:03Z

@claude please review

claude · 2026-06-05T14:53:16Z

Claude finished @SamuelBrucksch's task in 5m 19s —— View job

PR Review: Feature/speech-to-text

Gather context and read changed source files
Review TypeScript specs and types
Review HybridVoice TS wrapper
Review iOS native implementation (HybridVoice.swift, VoiceInputManager.swift)
Review Android native implementation (HybridVoice.kt, VoiceInputManager.kt)
Review utility and example app changes
Post comprehensive review

I note the previous Claude review was addressed in part — the samples thread-safety issue (old #1) was fixed in 32bab2c. The remaining issues below are what's still present plus one new finding.

Issues

1. iOS: STT error handler still missing `isStopping = true` (Medium)

File: packages/react-native-autoplay/ios/utils/VoiceInputManager.swift, lines 161–170

The recognitionTask error callback grabs samples under stopLock and calls cleanup(), but never sets isStopping = true. The audio tap (line 211) checks !self.isStopping and can still process buffers, potentially triggering triggerAutoStop() → stop() after the error handler has already cleaned up and resumed the continuation. ResultBox prevents a double-resume crash, but stop() would attempt redundant operations on already-nil'd state (e.g., recognitionRequest?.endAudio() on a nil reference).

Fix: Set self.isStopping = true inside the stopLock block in the error handler, matching the isFinal path at line 177:

if error != nil {
    self.stopLock.lock()
    self.isStopping = true  // ← add this
    let capturedSamples = self.samples
    self.samples = []
    self.stopLock.unlock()
    // ...
}

Fix this →

2. iOS: `hasVoiceInputPermission()` now gates on speech authorization — breaks PCM-only callers (Medium)

File: packages/react-native-autoplay/ios/hybrid/HybridVoice.swift, lines 8–11

let micGranted = AVAudioSession.sharedInstance().recordPermission == .granted
let speechGranted = SFSpeechRecognizer.authorizationStatus() == .authorized
return micGranted && speechGranted

Callers using preferSpeechToText: false (PCM-only) will fail the permission check if speech recognition isn't authorized. The JSDoc on the TS wrapper documents this, but the native behavior is misleading — a user who only wants raw PCM shouldn't need speech authorization. The Android side only checks RECORD_AUDIO.

Suggestion: Only require speech authorization when the caller intends to use STT (accept a parameter), or split into separate permission checks.

Fix this →

3. Android: `onError` silently returns empty result — caller can't distinguish failure from silence (Medium)

Files: packages/react-native-autoplay/android/.../VoiceInputManager.kt, lines 110–114 and 222–226

Both startSTT and startSTTWithSource resume with VoiceInputResult(transcription = null, audio = null) on error. This is indistinguishable from "speech recognition worked but detected nothing."

override fun onError(error: Int) {
    activeSpeechRecognizer = null
    recognizer.destroy()
    cont.resume(VoiceInputResult(transcription = null, audio = null))
}

At minimum, log the error code (SpeechRecognizer.ERROR_*). Consider falling back to PCM in startSTT (the non-car-audio path) similar to how iOS falls back.

4. iOS: `recognitionRequest` accessed without synchronization across threads (Low)

File: packages/react-native-autoplay/ios/utils/VoiceInputManager.swift

stop() calls recognitionRequest?.endAudio() at line 115 (outside stopLock), while cleanup() sets recognitionRequest = nil at line 297. These can execute concurrently — stop() from triggerAutoStop on a global queue, and cleanup() from the recognition task callback. Swift optional access is not atomic. In practice, optional chaining captures the reference before the nil assignment, so this is unlikely to crash, but it's technically a data race.

Suggestion: Either read recognitionRequest under stopLock in stop(), or make recognitionRequest an Atomic / protect it with the existing lock.

5. Android: `startSTTFromCarAudio` swallows pipe write errors, recording continues uselessly (Low)

File: packages/react-native-autoplay/android/.../VoiceInputManager.kt, lines 177–184

recordPCM(silenceThresholdMs, maxDurationMs) { chunk ->
    chunk.audio?.let { ab ->
        try {
            pipeOut.write(ab.toByteArray())
        } catch (_: Exception) {
        }
    }
}

If the recognizer crashes and closes the read end of the pipe, pipeOut.write() throws but the exception is swallowed. Recording continues accumulating PCM data that goes nowhere until silence detection or max duration stops it. Consider setting isRecording = false in the catch to stop early.

Previously Flagged — Lower Priority / Mitigated

Android recordPCM missing invokeOnCancellation: Mitigated by HybridVoice.kt's finally block calling dispose(), which cancels the scope.
HybridVoice instance vs static voiceInputManager: Mitigated by the JS wrapper caching the native object in _native.

Architecture

The refactor from embedding voice in AutoPlay to a dedicated Voice NitroModule is clean. The ResultBox pattern for safe continuation resumption, the pipe-based car audio → SpeechRecognizer bridge on Android (API 33+), and the TypeScript overloads on startVoiceInput are all well done.

github-actions · 2026-06-09T09:41:27Z

No significant issues found.

g4rb4g3 · 2026-06-09T09:47:41Z

@SamuelBrucksch I fixed all of them except for 2, the permission request is fine like this imho. STT isn't a worse then raw mic access anyways.

SamuelBrucksch · 2026-06-11T07:39:28Z

            title="record audio"
            onPress={async () => {
-              const granted = await HybridAutoPlay.requestVoiceInputPermission();
+              const granted = await HybridVoice.requestVoiceInputPermission();


just minor cosmetics, but for consistency you could use the same code style as in AutoTemplate.ts -> requestVoiceInputPermission.then(() => ...)

g4rb4g3 and others added 7 commits April 30, 2026 19:58

add speech to text support

5b54191

Merge remote-tracking branch 'origin/master' into feature/speech-to-text

30e4419

add Android Auto car mic STT

15b2929

chore(react-native-autoplay): bump version to 0.4.2 [skip ci]

c67848a

Merge remote-tracking branch 'origin/master' into feature/speech-to-text

5a1560a

fix merge errors

f6708b1

lint

7f2837b

SamuelBrucksch assigned SamuelBrucksch and unassigned SamuelBrucksch May 19, 2026

g4rb4g3 added 10 commits May 22, 2026 10:38

Merge remote-tracking branch 'origin/master' into feature/speech-to-text

98b0559

add language parameter for speech recognizer, always fulfil promise w…

a980b8c

…ith VoiceInputResult

bump

5add292

update readme on how to use expo-font

7c744fc

Merge remote-tracking branch 'origin/master' into feature/speech-to-text

cca775e

Merge remote-tracking branch 'origin/master' into feature/speech-to-text

17c6f3c

bump

dcf9337

nitrogen

32858c5

bump

c6a780d

fix: tap not locking when updating samples

32bab2c

g4rb4g3 marked this pull request as ready for review June 5, 2026 10:32

g4rb4g3 requested a review from SamuelBrucksch June 5, 2026 10:33

g4rb4g3 added 3 commits June 9, 2026 11:09

cr fixes

6e3a72f

cr fixes

2f69374

fall back to pcm on stt error

cd6f7bb

SamuelBrucksch approved these changes Jun 11, 2026

View reviewed changes

g4rb4g3 merged commit 07848a7 into master Jun 11, 2026
5 checks passed

g4rb4g3 deleted the feature/speech-to-text branch June 11, 2026 11:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/speech to text#82

Feature/speech to text#82
g4rb4g3 merged 20 commits into
masterfrom
feature/speech-to-text

g4rb4g3 commented May 15, 2026

Uh oh!

SamuelBrucksch commented May 19, 2026

Uh oh!

claude Bot commented May 19, 2026 •

edited by SamuelBrucksch

Loading

Uh oh!

SamuelBrucksch commented Jun 5, 2026

Uh oh!

claude Bot commented Jun 5, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 9, 2026

Uh oh!

g4rb4g3 commented Jun 9, 2026

Uh oh!

SamuelBrucksch Jun 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

g4rb4g3 commented May 15, 2026

Uh oh!

SamuelBrucksch commented May 19, 2026

Uh oh!

claude Bot commented May 19, 2026 • edited by SamuelBrucksch Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: Feature/speech-to-text

Summary

Architecture & Design — Looks Good

Issues & Suggestions

1. iOS: Thread safety of samples array in audio tap (Medium)

2. iOS: hasVoiceInputPermission now requires Speech authorization — breaking change? (Medium)

3. iOS: STT error handler may double-cleanup (Low)

4. Android: startSTT error returns empty result instead of throwing (Low)

5. Android: recordPCM uses suspendCancellableCoroutine but doesn't handle cancellation cleanly (Low)

6. iOS: HybridVoice.voiceInputManager is instance-level, not static (Low)

7. VoiceInputOptions type is defined but only used in the TS wrapper (Nit)

What Looks Good

Breaking Changes to Note

Uh oh!

SamuelBrucksch commented Jun 5, 2026

Uh oh!

claude Bot commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: Feature/speech-to-text

Issues

1. iOS: STT error handler still missing isStopping = true (Medium)

2. iOS: hasVoiceInputPermission() now gates on speech authorization — breaks PCM-only callers (Medium)

3. Android: onError silently returns empty result — caller can't distinguish failure from silence (Medium)

4. iOS: recognitionRequest accessed without synchronization across threads (Low)

5. Android: startSTTFromCarAudio swallows pipe write errors, recording continues uselessly (Low)

Previously Flagged — Lower Priority / Mitigated

Architecture

Uh oh!

github-actions Bot commented Jun 9, 2026

Uh oh!

g4rb4g3 commented Jun 9, 2026

Uh oh!

SamuelBrucksch Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

claude Bot commented May 19, 2026 •

edited by SamuelBrucksch

Loading

1. iOS: Thread safety of `samples` array in audio tap (Medium)

2. iOS: `hasVoiceInputPermission` now requires Speech authorization — breaking change? (Medium)

4. Android: `startSTT` error returns empty result instead of throwing (Low)

5. Android: `recordPCM` uses `suspendCancellableCoroutine` but doesn't handle cancellation cleanly (Low)

6. iOS: `HybridVoice.voiceInputManager` is instance-level, not static (Low)

7. `VoiceInputOptions` type is defined but only used in the TS wrapper (Nit)

claude Bot commented Jun 5, 2026 •

edited

Loading

1. iOS: STT error handler still missing `isStopping = true` (Medium)

2. iOS: `hasVoiceInputPermission()` now gates on speech authorization — breaks PCM-only callers (Medium)

3. Android: `onError` silently returns empty result — caller can't distinguish failure from silence (Medium)

4. iOS: `recognitionRequest` accessed without synchronization across threads (Low)

5. Android: `startSTTFromCarAudio` swallows pipe write errors, recording continues uselessly (Low)