Skip to content

25ohms/rtss-osc

Repository files navigation

RTSS OSC / Stem Send Overview

This codebase provides two Python entrypoints for real-time stem processing: stem_server.py (OSC metrics) and stem_send.py (multichannel stem routing to BlackHole 16ch). It uses a modular package rtss_osc for device discovery, FFmpeg setup, model loading, processing, and telemetry. A Rust/charon-based entrypoint (charon_server.py) shares the same processing pipeline.

All defaults are tuned for low-resource operation at 8 kHz stream rate, with optional HS-TasNet model inference (resampled to/from the model’s native rate).


Modules and Functions

stem_server.py

  • parse_args(): CLI parser for OSC server: input/output device substring, OSC target IP/port, block size, sample rate (default 8 kHz), queue length, --print-osc, --telemetry-interval, --list.
  • main(): Orchestrates FFmpeg check, device resolution, model load, creation of ProcessingWorker, and sounddevice.Stream callback that enqueues audio and passes through. Sends OSC RMS/peak per stem.

stem_send.py

  • parse_args(): CLI for routing stems to a 16ch device: input/output substrings (default output "BlackHole 16ch"), block, rate (8 kHz), queue length, --print-status, --list.
  • split_stems_fallback(block, rate): Band-split fallback at low SR (0–200, 200–1000, 1000–3000, 3000+ Hz) yielding four distinct stems when HS-TasNet isn’t available.
  • main(): Sets up devices, loads model, spawns async processing thread (HS-TasNet if available; otherwise fallback), and runs a 2-in/16-out stream mapping: ch1 base mono, ch2 drums, ch3 vocals, ch4 bass, ch5 other, 6–16 silent.
  • MAP: Exported channel mapping used by tests.

charon_server.py

  • CharonAudioAdapter: Placeholder wrapper around a charon_audio.Stream (Rust binding). Adjust to match your actual charon API.
  • parse_args(): CLI similar to stem_server (input/output identifiers for charon, OSC IP/port, block, rate=8 kHz, queue length, telemetry).
  • main(): Loads model, starts a ProcessingWorker, and uses the charon adapter to stream audio; sends OSC RMS/peak per stem.

rtss_osc.devices

  • list_devices(): Print available sounddevice devices and exit.
  • find_device_by_name(substring, is_input=True): Resolve device index by substring and channel direction.

rtss_osc.ffmpeg

  • ensure_ffmpeg(fatal=True): Locate FFmpeg (prefers Homebrew paths) and prepend to PATH; fatal or warning mode.

rtss_osc.model

  • patch_torchcodec_stub(): Injects a minimal torchcodec stub to allow hs_tasnet import if native libs fail.
  • maybe_load_model_deps(): Lazy import of torch and hs_tasnet with stub fallback.
  • load_model(): Returns (model, device, use_model). Sets torch threads to 1, prefers MPS if available, otherwise CPU.

rtss_osc.processing

  • _resample_1d(x, orig_sr, target_sr): Lightweight linear resampler for small blocks.
  • ProcessingWorker: Worker thread for off-callback processing.
    • Resamples stream-rate audio to model rate (if needed), runs HS-TasNet, and resamples stems back.
    • Computes RMS/peak and sends OSC (stem_server/charon_server) or returns stems to callers that need them.

rtss_osc.telemetry

  • TelemetryMonitor: Periodic CPU/memory logging for debugging.

Tests

  • test_stem_send_fallback.py: Offline check that fallback stems are distinct (RMS and pairwise L2 diffs) at 8 kHz using synthetic tones.

How It Fits Together

  • Entry scripts (stem_server.py, stem_send.py, charon_server.py) parse CLI flags, resolve devices (sounddevice or charon), ensure FFmpeg, and load the model via rtss_osc.model.
  • ProcessingWorker asynchronously processes audio: resample → HS-TasNet (if available) → resample back → emit metrics (server) or stems (send).
  • stem_server.py uses ProcessingWorker to compute RMS/peak per stem and send OSC (/stems/{label}/rms|peak).
  • stem_send.py uses ProcessingWorker to produce stems and maps them to BlackHole 16ch outputs (ch1 base, ch2 drums, ch3 vocals, ch4 bass, ch5 other, others silent).
  • Fallback paths (no model) still generate distinct stems via frequency splits tuned for the 8 kHz stream rate.

CLI Flags and Purposes

Common (stem_server, stem_send, charon_server)

  • --input <substr>: Substring to match input device/identifier.
  • --output <substr>: Substring to match output device/identifier.
  • --block <n>: Audio block size in samples (default 4096 at 8 kHz) — higher reduces CPU load but increases latency.
  • --rate <Hz>: Stream sample rate (default 8000 Hz) — lower reduces load but cuts bandwidth.
  • --queue-len <n>: Max queued blocks for worker; higher tolerates bursts, lower reduces latency.
  • --print-osc / --print-status: Verbose logging of outgoing OSC or status.
  • --list: List devices and exit (sounddevice paths).

stem_server.py specifics

  • --ip <addr> / --port <p>: OSC target.
  • --telemetry-interval <sec>: Emit CPU/mem telemetry periodically.

stem_send.py specifics

  • Output channel map fixed: ch1 base, ch2 drums, ch3 vocals, ch4 bass, ch5 other, ch6–16 silent.

charon_server.py specifics

  • Uses Rust charon_audio binding for streaming (adjust adapter to your API).

Notes on HS-TasNet and Sample Rates

  • Stream rate is 8 kHz for efficiency. ProcessingWorker resamples to the model’s native rate for HS-TasNet inference, then resamples stems back to the stream rate. If HS-TasNet fails to load, the fallback band-split ensures stems remain distinct.

About

Real Time Stem Separation with OSC compatibility

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages