Skip to content
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 14 additions & 10 deletions tutorials/23-whisper-api-transcription.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,8 @@ vllm serve \

Create and run a router connected to the Whisper backend:

run-router.sh:

```bash
#!/bin/bash
if [[ $# -ne 2 ]]; then
Expand All @@ -37,21 +39,23 @@ if [[ $# -ne 2 ]]; then
fi

uv run python3 -m vllm_router.app \
--host 0.0.0.0 --port "$1" \
--service-discovery static \
--static-backends "$2" \
--static-models "openai/whisper-small" \
--static-model-types "transcription" \
--routing-logic roundrobin \
--log-stats \
--engine-stats-interval 10 \
--request-stats-window 10
--host 0.0.0.0 --port "$1" \
--service-discovery static \
--static-backends "$2" \
--static-models "openai/whisper-small" \
--static-model-labels "transcription" \
--routing-logic roundrobin \
--log-stats \
--log-level debug \ # log level: "debug", "info", "warning", "error", "critical"
--engine-stats-interval 10 \
--request-stats-window 10
--static-backend-health-checks # Enable this flag to make vllm-router check periodically if the models work by sending dummy requests to their endpoints.
```

Example usage:

```bash
./run-router.sh 8000 http://localhost:8002
./run-router.sh 8000 http://0.0.0.0:8002
```

## 3. Sending a Transcription Request
Expand Down
Loading