Skip to content

add qwen-asr HTTP inference server#6

Open
gicrisf wants to merge 3 commits intoantirez:mainfrom
gicrisf:server
Open

add qwen-asr HTTP inference server#6
gicrisf wants to merge 3 commits intoantirez:mainfrom
gicrisf:server

Conversation

@gicrisf
Copy link

@gicrisf gicrisf commented Feb 21, 2026

Adds a single-process HTTP inference server for Qwen3-ASR, adapted from the whisper.cpp server example.

Endpoints

  • POST /inference — accepts a multipart WAV upload, returns JSON or plain text
  • POST /load — hot-swap the loaded model at runtime
  • GET /health — readiness probe
  • GET / — serves public/index.html (live microphone UI)

Notable differences from the whisper.cpp original

  • Per-request language and prompt overrides (restored to server defaults after each request)
  • Response always includes inference timing and rt_factor (processing time / audio duration)
  • Language dropdown in the browser UI uses Qwen3-ASR language names instead of BCP-47 codes
  • No whisper-specific params (diarize, timestamps, beam search, VAD, etc.)

Build: make server → ./qwen_asr_server -d <model_dir>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant