Add real-time stdio streaming runtime with framed PCM protocol#61
Open
nsalminen wants to merge 1 commit intoNVIDIA:mainfrom
Open
Add real-time stdio streaming runtime with framed PCM protocol#61nsalminen wants to merge 1 commit intoNVIDIA:mainfrom
nsalminen wants to merge 1 commit intoNVIDIA:mainfrom
Conversation
- Add moshi.stdio for real-time audio I/O over stdin/stdout
- Implement length-prefixed binary protocol (u32_le + payload) with kinds:
- 0x00 handshake
- 0x01 audio (PCM16 mono)
- 0x02 text
- 0x05 error
- 0x06 ping
- Mirror server/offline model flow (load/warmup/prompts/streaming state)
- Add robust packet parser, PCM frame buffer, and binary-safe stdout writer
- Add EOF drain behavior to flush delayed model outputs (--eof-drain-frames, default 32)
- Wire CLI entrypoint moshi-stdio in pyproject.toml
- Add stdio test with existing assets/test files for quick validation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR adds a real-time stdin/stdout runtime for Moshi (
moshi-stdio) so audio can be streamed in/out of other processes without the web UI/WebSocket stack. It includes EOF drain handling to avoid truncated tails and a concise stdio test runner that always executes the end-to-end path using existing test assets.Approach
The runtime uses a framed PCM protocol (
[u32 len][payload]) with explicit message kinds and buffers PCM16 frames to the model frame size. On EOF, it feeds configurable silence frames to flush delayed outputs. A single test runner script validates framing and then runsmoshi.stdioend-to-end.How to test:
Optional alternate input:
Key Changes
moshi/moshi/stdio.py[u32 len][payload]0x00handshake,0x01audio (PCM16),0x02text,0x05error,0x06ping32)moshi/pyproject.tomlmoshi-stdio = moshi.stdio:mainmoshi/test/stdio_realtime_check.py