This codebase provides two Python entrypoints for real-time stem processing:
stem_server.py (OSC metrics) and stem_send.py (multichannel stem routing to
BlackHole 16ch). It uses a modular package rtss_osc for device discovery,
FFmpeg setup, model loading, processing, and telemetry. A Rust/charon-based
entrypoint (charon_server.py) shares the same processing pipeline.
All defaults are tuned for low-resource operation at 8 kHz stream rate, with optional HS-TasNet model inference (resampled to/from the model’s native rate).
- parse_args(): CLI parser for OSC server: input/output device substring,
OSC target IP/port, block size, sample rate (default 8 kHz), queue length,
--print-osc,--telemetry-interval,--list. - main(): Orchestrates FFmpeg check, device resolution, model load,
creation of
ProcessingWorker, andsounddevice.Streamcallback that enqueues audio and passes through. Sends OSC RMS/peak per stem.
- parse_args(): CLI for routing stems to a 16ch device: input/output
substrings (default output "BlackHole 16ch"), block, rate (8 kHz), queue
length,
--print-status,--list. - split_stems_fallback(block, rate): Band-split fallback at low SR (0–200, 200–1000, 1000–3000, 3000+ Hz) yielding four distinct stems when HS-TasNet isn’t available.
- main(): Sets up devices, loads model, spawns async processing thread (HS-TasNet if available; otherwise fallback), and runs a 2-in/16-out stream mapping: ch1 base mono, ch2 drums, ch3 vocals, ch4 bass, ch5 other, 6–16 silent.
- MAP: Exported channel mapping used by tests.
- CharonAudioAdapter: Placeholder wrapper around a
charon_audio.Stream(Rust binding). Adjust to match your actual charon API. - parse_args(): CLI similar to stem_server (input/output identifiers for charon, OSC IP/port, block, rate=8 kHz, queue length, telemetry).
- main(): Loads model, starts a
ProcessingWorker, and uses the charon adapter to stream audio; sends OSC RMS/peak per stem.
- list_devices(): Print available sounddevice devices and exit.
- find_device_by_name(substring, is_input=True): Resolve device index by substring and channel direction.
- ensure_ffmpeg(fatal=True): Locate FFmpeg (prefers Homebrew paths) and prepend to PATH; fatal or warning mode.
- patch_torchcodec_stub(): Injects a minimal torchcodec stub to allow hs_tasnet import if native libs fail.
- maybe_load_model_deps(): Lazy import of torch and hs_tasnet with stub fallback.
- load_model(): Returns (model, device, use_model). Sets torch threads to 1, prefers MPS if available, otherwise CPU.
- _resample_1d(x, orig_sr, target_sr): Lightweight linear resampler for small blocks.
- ProcessingWorker: Worker thread for off-callback processing.
- Resamples stream-rate audio to model rate (if needed), runs HS-TasNet, and resamples stems back.
- Computes RMS/peak and sends OSC (stem_server/charon_server) or returns stems to callers that need them.
- TelemetryMonitor: Periodic CPU/memory logging for debugging.
- test_stem_send_fallback.py: Offline check that fallback stems are distinct (RMS and pairwise L2 diffs) at 8 kHz using synthetic tones.
- Entry scripts (
stem_server.py,stem_send.py,charon_server.py) parse CLI flags, resolve devices (sounddevice or charon), ensure FFmpeg, and load the model viartss_osc.model. ProcessingWorkerasynchronously processes audio: resample → HS-TasNet (if available) → resample back → emit metrics (server) or stems (send).stem_server.pyusesProcessingWorkerto compute RMS/peak per stem and send OSC (/stems/{label}/rms|peak).stem_send.pyusesProcessingWorkerto produce stems and maps them to BlackHole 16ch outputs (ch1 base, ch2 drums, ch3 vocals, ch4 bass, ch5 other, others silent).- Fallback paths (no model) still generate distinct stems via frequency splits tuned for the 8 kHz stream rate.
--input <substr>: Substring to match input device/identifier.--output <substr>: Substring to match output device/identifier.--block <n>: Audio block size in samples (default 4096 at 8 kHz) — higher reduces CPU load but increases latency.--rate <Hz>: Stream sample rate (default 8000 Hz) — lower reduces load but cuts bandwidth.--queue-len <n>: Max queued blocks for worker; higher tolerates bursts, lower reduces latency.--print-osc/--print-status: Verbose logging of outgoing OSC or status.--list: List devices and exit (sounddevice paths).
--ip <addr>/--port <p>: OSC target.--telemetry-interval <sec>: Emit CPU/mem telemetry periodically.
- Output channel map fixed: ch1 base, ch2 drums, ch3 vocals, ch4 bass, ch5 other, ch6–16 silent.
- Uses Rust
charon_audiobinding for streaming (adjust adapter to your API).
- Stream rate is 8 kHz for efficiency.
ProcessingWorkerresamples to the model’s native rate for HS-TasNet inference, then resamples stems back to the stream rate. If HS-TasNet fails to load, the fallback band-split ensures stems remain distinct.