Skip to content

Add real-time stdio streaming runtime with framed PCM protocol#61

Open
nsalminen wants to merge 1 commit intoNVIDIA:mainfrom
nsalminen:stream
Open

Add real-time stdio streaming runtime with framed PCM protocol#61
nsalminen wants to merge 1 commit intoNVIDIA:mainfrom
nsalminen:stream

Conversation

@nsalminen
Copy link

This PR adds a real-time stdin/stdout runtime for Moshi (moshi-stdio) so audio can be streamed in/out of other processes without the web UI/WebSocket stack. It includes EOF drain handling to avoid truncated tails and a concise stdio test runner that always executes the end-to-end path using existing test assets.

Approach

The runtime uses a framed PCM protocol ([u32 len][payload]) with explicit message kinds and buffers PCM16 frames to the model frame size. On EOF, it feeds configurable silence frames to flush delayed outputs. A single test runner script validates framing and then runs moshi.stdio end-to-end.

How to test:

./.venv/bin/python moshi/test/stdio_realtime_check.py \
  --moshi-args --voice-prompt NATM1.pt --device cuda

Optional alternate input:

./.venv/bin/python moshi/test/stdio_realtime_check.py \
  --input-wav assets/test/input_service.wav \
  --moshi-args --voice-prompt NATM1.pt --device cuda

Key Changes

  • moshi/moshi/stdio.py
    • Framed protocol: [u32 len][payload]
    • Kinds: 0x00 handshake, 0x01 audio (PCM16), 0x02 text, 0x05 error, 0x06 ping
    • PCM16 buffering and EOF drain frames (default 32)
    • stderr-only logging to keep stdout binary-safe
  • moshi/pyproject.toml
    • new entrypoint moshi-stdio = moshi.stdio:main
  • moshi/test/stdio_realtime_check.py
    • framing roundtrip + always runs stdio E2E
    • uses existing assets by default and writes outputs to repo root

- Add moshi.stdio for real-time audio I/O over stdin/stdout
- Implement length-prefixed binary protocol (u32_le + payload) with kinds:
    - 0x00 handshake
    - 0x01 audio (PCM16 mono)
    - 0x02 text
    - 0x05 error
    - 0x06 ping
- Mirror server/offline model flow (load/warmup/prompts/streaming state)
- Add robust packet parser, PCM frame buffer, and binary-safe stdout writer
- Add EOF drain behavior to flush delayed model outputs (--eof-drain-frames, default 32)
- Wire CLI entrypoint moshi-stdio in pyproject.toml
- Add stdio test with existing assets/test files for quick validation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant