[Bugfix][Router]: loop through model_names #681

max-wittig · 2025-09-04T16:02:30Z

Fixes this crash. model_name does not exist:

[2025-09-04 15:47:55,962] DEBUG: ==== Enter audio_transcriptions ==== (request.py:542:vllm_router.services.request_service.request)
[2025-09-04 15:47:55,962] DEBUG: Received upload: example.m4a (audio/mp4) (request.py:543:vllm_router.services.request_service.request)
[2025-09-04 15:47:55,962] DEBUG: Params: model=whisper-large-v3-turbo prompt=None response_format='json' temperature=None language=en (request.py:544:vllm_router.services.request_service.request)
[2025-09-04 15:47:55,963] DEBUG: ==== Total endpoints ==== (request.py:568:vllm_router.services.request_service.request)
[2025-09-04 15:47:55,963] DEBUG: [EndpointInfo(url='https://example-01.server.com/bge-m3', model_names=['bge-m3'], Id='dae569d7-58b3-458e-b98a-4607dedd83dd', added_timestamp=1756913005, model_label='default', sleep=False, pod_name=None, service_name=None, namespace=None, model_info={'bge-m3': ModelInfo(id='bge-m3', object='model', created=1757000875, owned_by='vllm', root=None, parent=None, is_adapter=False)}), EndpointInfo(url='https://example-02.server.com/bge-m3', model_names=['bge-m3'], Id='e4182761-a362-4cc0-b847-6d83b7c24eb3', added_timestamp=1756913005, model_label='default', sleep=False, pod_name=None, service_name=None, namespace=None, model_info={'bge-m3': ModelInfo(id='bge-m3', object='model', created=1757000875, owned_by='vllm', root=None, parent=None, is_adapter=False)}), EndpointInfo(url='https://example-01.server.com/bge-reranker-v2-m3', model_names=['bge-reranker-v2-m3'], Id='e6a0a739-8477-4bba-a9e1-dbbf9789b59f', added_timestamp=1756913005, model_label='default', sleep=False, pod_name=None, service_name=None, namespace=None, model_info={'bge-reranker-v2-m3': ModelInfo(id='bge-reranker-v2-m3', object='model', created=1757000875, owned_by='vllm', root=None, parent=None, is_adapter=False)}), EndpointInfo(url='https://example-02.server.com/bge-reranker-v2-m3', model_names=['bge-reranker-v2-m3'], Id='9cf450d6-4ed5-4782-9e8f-aae50de5d197', added_timestamp=1756913005, model_label='default', sleep=False, pod_name=None, service_name=None, namespace=None, model_info={'bge-reranker-v2-m3': ModelInfo(id='bge-reranker-v2-m3', object='model', created=1757000875, owned_by='vllm', root=None, parent=None, is_adapter=False)}), EndpointInfo(url='https://example-03.server.com/deepseek-r1', model_names=['deepseek-r1-distill-qwen-7b'], Id='9ef9cf7c-572e-4968-88a7-f0c7dad3c266', added_timestamp=1756913005, model_label='default', sleep=False, pod_name=None, service_name=None, namespace=None, model_info={'deepseek-r1-distill-qwen-7b': ModelInfo(id='deepseek-r1-distill-qwen-7b', object='model', created=1757000875, owned_by='vllm', root=None, parent=None, is_adapter=False)}), EndpointInfo(url='https://example-04.server.com/deepseek-r1', model_names=['deepseek-r1-distill-qwen-7b'], Id='d5fd0bcd-d7ee-4f0b-9baf-6583bdab80db', added_timestamp=1756913005, model_label='default', sleep=False, pod_name=None, service_name=None, namespace=None, model_info={'deepseek-r1-distill-qwen-7b': ModelInfo(id='deepseek-r1-distill-qwen-7b', object='model', created=1757000875, owned_by='vllm', root=None, parent=None, is_adapter=False)}), EndpointInfo(url='https://example-01.server.com/deepseek-r1-0528-qwen3-8b', model_names=['deepseek-r1-0528-qwen3-8b'], Id='13b385eb-dfda-4a60-a2ea-7fb5cde0faf3', added_timestamp=1756913005, model_label='default', sleep=False, pod_name=None, service_name=None, namespace=None, model_info={'deepseek-r1-0528-qwen3-8b': ModelInfo(id='deepseek-r1-0528-qwen3-8b', object='model', created=1757000875, owned_by='vllm', root=None, parent=None, is_adapter=False)}), EndpointInfo(url='https://example-02.server.com/deepseek-r1-0528-qwen3-8b', model_names=['deepseek-r1-0528-qwen3-8b'], Id='3b2a5ed7-3ee2-4f93-8f22-12f8f24be51c', added_timestamp=1756913005, model_label='default', sleep=False, pod_name=None, service_name=None, namespace=None, model_info={'deepseek-r1-0528-qwen3-8b': ModelInfo(id='deepseek-r1-0528-qwen3-8b', object='model', created=1757000875, owned_by='vllm', root=None, parent=None, is_adapter=False)}), EndpointInfo(url='https://example-03.server.com/devstral-small-2505', model_names=['devstral-small-2505'], Id='515e98ca-4ffa-41aa-bf4c-deccdebb87d4', added_timestamp=1756913005, model_label='default', sleep=False, pod_name=None, service_name=None, namespace=None, model_info={'devstral-small-2505': ModelInfo(id='devstral-small-2505', object='model', created=1757000875, owned_by='vllm', root=None, parent=None, is_adapter=False)}), EndpointInfo(url='https://example-04.server.com/devstral-small-2505', model_names=['devstral-small-2505'], Id='fdee8b1c-1457-4e44-af03-ee32fa69b328', added_timestamp=1756913005, model_label='default', sleep=False, pod_name=None, service_name=None, namespace=None, model_info={'devstral-small-2505': ModelInfo(id='devstral-small-2505', object='model', created=1757000875, owned_by='vllm', root=None, parent=None, is_adapter=False)}), EndpointInfo(url='https://example-03.server.com/gpt-oss-120b', model_names=['gpt-oss-120b'], Id='363f9a01-f9e7-42dd-afd2-10b988568f4e', added_timestamp=1756913005, model_label='default', sleep=False, pod_name=None, service_name=None, namespace=None, model_info={'gpt-oss-120b': ModelInfo(id='gpt-oss-120b', object='model', created=1757000875, owned_by='vllm', root=None, parent=None, is_adapter=False)}), EndpointInfo(url='https://example-04.server.com/gpt-oss-120b', model_names=['gpt-oss-120b'], Id='17b1749b-bb09-4a2d-9280-22f54abfc7d5', added_timestamp=1756913005, model_label='default', sleep=False, pod_name=None, service_name=None, namespace=None, model_info={'gpt-oss-120b': ModelInfo(id='gpt-oss-120b', object='model', created=1757000875, owned_by='vllm', root=None, parent=None, is_adapter=False)}), EndpointInfo(url='https://example-01.server.com/mistral-7b-instruct', model_names=['mistral-7b-instruct'], Id='e978f9f2-9da7-4971-a20a-0d4d9f833eae', added_timestamp=1756913005, model_label='default', sleep=False, pod_name=None, service_name=None, namespace=None, model_info={'mistral-7b-instruct': ModelInfo(id='mistral-7b-instruct', object='model', created=1757000875, owned_by='vllm', root=None, parent=None, is_adapter=False)}), EndpointInfo(url='https://example-02.server.com/mistral-7b-instruct', model_names=['mistral-7b-instruct'], Id='6c990bf2-2a02-49bf-b58c-debd6e56ace3', added_timestamp=1756913005, model_label='default', sleep=False, pod_name=None, service_name=None, namespace=None, model_info={'mistral-7b-instruct': ModelInfo(id='mistral-7b-instruct', object='model', created=1757000875, owned_by='vllm', root=None, parent=None, is_adapter=False)}), EndpointInfo(url='https://example-03.server.com/starcoder', model_names=['starcoder2-3b'], Id='06353f55-2fd9-43b3-ac30-8e52437265d8', added_timestamp=1756913005, model_label='default', sleep=False, pod_name=None, service_name=None, namespace=None, model_info={'starcoder2-3b': ModelInfo(id='starcoder2-3b', object='model', created=1757000875, owned_by='vllm', root=None, parent=None, is_adapter=False)}), EndpointInfo(url='https://example-04.server.com/starcoder', model_names=['starcoder2-3b'], Id='dd00fd64-0579-4f55-b508-5d7847955931', added_timestamp=1756913005, model_label='default', sleep=False, pod_name=None, service_name=None, namespace=None, model_info={'starcoder2-3b': ModelInfo(id='starcoder2-3b', object='model', created=1757000875, owned_by='vllm', root=None, parent=None, is_adapter=False)}), EndpointInfo(url='https://example-03.server.com/qwen3-30b-a3b', model_names=['qwen3-30b-a3b'], Id='d4fb240e-e067-4751-a718-a8ef466d2609', added_timestamp=1756913005, model_label='default', sleep=False, pod_name=None, service_name=None, namespace=None, model_info={'qwen3-30b-a3b': ModelInfo(id='qwen3-30b-a3b', object='model', created=1757000875, owned_by='vllm', root=None, parent=None, is_adapter=False)}), EndpointInfo(url='https://example-04.server.com/qwen3-30b-a3b', model_names=['qwen3-30b-a3b'], Id='84a62692-95f1-4371-8b4f-d1d4e7d621bc', added_timestamp=1756913005, model_label='default', sleep=False, pod_name=None, service_name=None, namespace=None, model_info={'qwen3-30b-a3b': ModelInfo(id='qwen3-30b-a3b', object='model', created=1757000875, owned_by='vllm', root=None, parent=None, is_adapter=False)}), EndpointInfo(url='https://example-03.server.com/qwen3-coder-30b-a3b-instruct', model_names=['qwen3-coder-30b-a3b-instruct'], Id='09062080-585a-44d3-95c0-48b8998f6b19', added_timestamp=1756913005, model_label='default', sleep=False, pod_name=None, service_name=None, namespace=None, model_info={'qwen3-coder-30b-a3b-instruct': ModelInfo(id='qwen3-coder-30b-a3b-instruct', object='model', created=1757000875, owned_by='vllm', root=None, parent=None, is_adapter=False)}), EndpointInfo(url='https://example-04.server.com/qwen3-coder-30b-a3b-instruct', model_names=['qwen3-coder-30b-a3b-instruct'], Id='5d6a1068-04b9-418d-8bf4-8a62ff67b745', added_timestamp=1756913005, model_label='default', sleep=False, pod_name=None, service_name=None, namespace=None, model_info={'qwen3-coder-30b-a3b-instruct': ModelInfo(id='qwen3-coder-30b-a3b-instruct', object='model', created=1757000875, owned_by='vllm', root=None, parent=None, is_adapter=False)}), EndpointInfo(url='https://example-01.server.com/poc-1', model_names=['llama-3.1-8b-instruct'], Id='b204bbe0-ef56-4fe8-892a-e8a251e57e25', added_timestamp=1756913005, model_label='default', sleep=False, pod_name=None, service_name=None, namespace=None, model_info={'llama-3.1-8b-instruct': ModelInfo(id='llama-3.1-8b-instruct', object='model', created=1757000875, owned_by='vllm', root=None, parent=None, is_adapter=False)}), EndpointInfo(url='https://example-02.server.com/poc-2', model_names=['mistral-nemo-instruct-2407'], Id='22c4c983-3270-4d63-b833-e1b0f6c8d0d1', added_timestamp=1756913005, model_label='default', sleep=False, pod_name=None, service_name=None, namespace=None, model_info={'mistral-nemo-instruct-2407': ModelInfo(id='mistral-nemo-instruct-2407', object='model', created=1757000875, owned_by='vllm', root=None, parent=None, is_adapter=False)}), EndpointInfo(url='https://example-02.server.com/poc-1', model_names=['pixtral-12b-2409'], Id='ab944f1d-8929-4570-ac58-10e4a056cf96', added_timestamp=1756913005, model_label='default', sleep=False, pod_name=None, service_name=None, namespace=None, model_info={'pixtral-12b-2409': ModelInfo(id='pixtral-12b-2409', object='model', created=1757000875, owned_by='vllm', root=None, parent=None, is_adapter=False)}), EndpointInfo(url='https://example-02.server.com/poc-3', model_names=['qwen2.5-coder-7b'], Id='63dcbaef-6557-40ce-a9fd-398312fdf4d1', added_timestamp=1756913005, model_label='default', sleep=False, pod_name=None, service_name=None, namespace=None, model_info={'qwen2.5-coder-7b': ModelInfo(id='qwen2.5-coder-7b', object='model', created=1757000875, owned_by='vllm', root=None, parent=None, is_adapter=False)}), EndpointInfo(url='https://itips-llm-int.openshift.siemens.com', model_names=['Mistral-Small-24B-Instruct-2501-FP8-dynamic'], Id='5e4c6547-55c6-4982-8fb1-84b1eb675a3d', added_timestamp=1756913005, model_label='default', sleep=False, pod_name=None, service_name=None, namespace=None, model_info={'Mistral-Small-24B-Instruct-2501-FP8-dynamic': ModelInfo(id='Mistral-Small-24B-Instruct-2501-FP8-dynamic', object='model', created=1757000875, owned_by='vllm', root=None, parent=None, is_adapter=False)}), EndpointInfo(url='https://example-05.server.com/codestral-2508', model_names=['codestral-2508'], Id='1d5b9484-0de3-4aeb-9741-378904f833ec', added_timestamp=1756913005, model_label='default', sleep=False, pod_name=None, service_name=None, namespace=None, model_info={'codestral-2508': ModelInfo(id='codestral-2508', object='model', created=1757000875, owned_by='vllm', root=None, parent=None, is_adapter=False)}), EndpointInfo(url='https://example-06.server.com', model_names=['codestral-2508'], Id='469716a6-e123-42ad-89cd-a6d663a9dfa7', added_timestamp=1756913005, model_label='default', sleep=False, pod_name=None, service_name=None, namespace=None, model_info={'codestral-2508': ModelInfo(id='codestral-2508', object='model', created=1757000875, owned_by='vllm', root=None, parent=None, is_adapter=False)}), EndpointInfo(url='https://example-06.server.com', model_names=['devstral-medium-2507'], Id='7ae2f424-c8d5-47f6-963d-53592459848f', added_timestamp=1756913005, model_label='default', sleep=False, pod_name=None, service_name=None, namespace=None, model_info={'devstral-medium-2507': ModelInfo(id='devstral-medium-2507', object='model', created=1757000875, owned_by='vllm', root=None, parent=None, is_adapter=False)}), EndpointInfo(url='https://example-01.server.com/whisper-large-v3-turbo', model_names=['whisper-large-v3-turbo'], Id='b9b85b73-783e-4d8d-b22f-c56ba006388d', added_timestamp=1756913005, model_label='default', sleep=False, pod_name=None, service_name=None, namespace=None, model_info={'whisper-large-v3-turbo': ModelInfo(id='whisper-large-v3-turbo', object='model', created=1757000875, owned_by='vllm', root=None, parent=None, is_adapter=False)})] (request.py:569:vllm_router.services.request_service.request)
[2025-09-04 15:47:55,963] DEBUG: ==== Total endpoints ==== (request.py:570:vllm_router.services.request_service.request)
INFO:     172.16.0.41:51278 - "POST /v1/audio/transcriptions HTTP/1.1" 500 Internal Server Error
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/opt/venv/lib/python3.12/site-packages/uvicorn/protocols/http/h11_impl.py", line 403, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/uvicorn/middleware/proxy_headers.py", line 60, in __call__
    return await self.app(scope, receive, send)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/fastapi/applications.py", line 1054, in __call__
    await super().__call__(scope, receive, send)
  File "/opt/venv/lib/python3.12/site-packages/sentry_sdk/integrations/starlette.py", line 409, in _sentry_patched_asgi_app
    return await middleware(scope, receive, send)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/sentry_sdk/integrations/asgi.py", line 158, in _run_asgi3
    return await self._run_app(scope, receive, send, asgi_version=3)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/sentry_sdk/integrations/asgi.py", line 260, in _run_app
    raise exc from None
  File "/opt/venv/lib/python3.12/site-packages/sentry_sdk/integrations/asgi.py", line 255, in _run_app
    return await self.app(
           ^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/starlette/applications.py", line 112, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/opt/venv/lib/python3.12/site-packages/sentry_sdk/integrations/starlette.py", line 200, in _create_span_call
    return await old_call(app, scope, new_receive, new_send, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/starlette/middleware/errors.py", line 187, in __call__
    raise exc
  File "/opt/venv/lib/python3.12/site-packages/starlette/middleware/errors.py", line 165, in __call__
    await self.app(scope, receive, _send)
  File "/opt/venv/lib/python3.12/site-packages/sentry_sdk/integrations/starlette.py", line 298, in _sentry_exceptionmiddleware_call
    await old_call(self, scope, receive, send)
  File "/opt/venv/lib/python3.12/site-packages/vllm_router/services/request_service/request.py", line 576, in route_general_transcriptions
  if model == ep.model_name
              ^^^^^^^^^^^^^
  AttributeError: 'EndpointInfo' object has no attribute 'model_name'. Did you mean: 'model_names'?

Tested on our dev system:

Make sure the code changes pass the pre-commit checks.
Sign-off your commit by using -s when doing git commit
Try to classify PRs for easy understanding of the type of changes, such as [Bugfix], [Feat], and [CI].

Detailed Checklist (Click to Expand)

Thank you for your contribution to production-stack! Before submitting the pull request, please ensure the PR meets the following criteria. This helps us maintain the code quality and improve the efficiency of the review process.

PR Title and Classification

Please try to classify PRs for easy understanding of the type of changes. The PR title is prefixed appropriately to indicate the type of change. Please use one of the following:

[Bugfix] for bug fixes.
[CI/Build] for build or continuous integration improvements.
[Doc] for documentation fixes and improvements.
[Feat] for new features in the cluster (e.g., autoscaling, disaggregated prefill, etc.).
[Router] for changes to the vllm_router (e.g., routing algorithm, router observability, etc.).
[Misc] for PRs that do not fit the above categories. Please use this sparingly.

Note: If the PR spans more than one category, please include all relevant prefixes.

Code Quality

The PR need to meet the following code quality standards:

Pass all linter checks. Please use pre-commit to format your code. See README.md for installation.
The code need to be well-documented to ensure future contributors can easily understand the code.
Please include sufficient tests to ensure the change is stay correct and robust. This includes both unit tests and integration tests.

DCO and Signed-off-by

When contributing changes to this project, you must agree to the DCO. Commits must include a Signed-off-by: header which certifies agreement with the terms of the DCO.

Using -s with git commit will automatically add this header.

What to Expect for the Reviews

We aim to address all PRs in a timely manner. If no one reviews your PR within 5 days, please @-mention one of YuhanLiu11
, Shaoting-Feng or ApostaC.

max-wittig · 2025-09-08T06:28:25Z

src/vllm_router/services/request_service/request.py

-    logger.debug("==== Total endpoints ====")
-    logger.debug(endpoints)
-    logger.debug("==== Total endpoints ====")


This MR also removes a lot of excessive logging, while being more transparent about errors

Signed-off-by: Max Wittig <[email protected]>

max-wittig · 2025-09-08T12:16:48Z

@YuhanLiu11 We have this running in our dev environment. Let me know what you think about this bugfix.

YuhanLiu11 · 2025-09-09T20:31:48Z

@max-wittig hey I understand that we need to modify route_general_transcriptions to fix this bug, but why do we need to modify service_discovery.py for this bug fix?

max-wittig · 2025-09-10T08:21:25Z

@YuhanLiu11 Because we need to use static-model-types as the current functionality is using static-model-labels which is wrong.

It modifies service_discovery as tests were failing in the k8 routing functionality. Let me know, if I should split this up more.

max-wittig · 2025-09-10T14:51:12Z

Split up into the first part: #694

YuhanLiu11 · 2025-09-10T17:01:56Z

Hey @max-wittig yes, splitting it to two PRs sounds good! Thanks

max-wittig force-pushed the fix/loop-through-model-names branch 9 times, most recently from 89aa6f9 to 4c9eeda Compare September 8, 2025 06:27

max-wittig commented Sep 8, 2025

View reviewed changes

max-wittig force-pushed the fix/loop-through-model-names branch 2 times, most recently from f3727d2 to 4da5ad3 Compare September 8, 2025 06:30

max-wittig changed the title ~~fix(request-audio): loop through model_names~~ [Bugfix][Router]: loop through model_names Sep 8, 2025

max-wittig mentioned this pull request Sep 8, 2025

[Docs] Correct parameter in transcription API tutorial #685

Draft

3 tasks

max-wittig force-pushed the fix/loop-through-model-names branch 3 times, most recently from 633bcd0 to 9472fd0 Compare September 8, 2025 06:41

fix(request-audio): loop through model_names

8e02e5c

Signed-off-by: Max Wittig <[email protected]>

max-wittig force-pushed the fix/loop-through-model-names branch from 9472fd0 to 8e02e5c Compare September 8, 2025 06:45

refactor: add model_types for K8sPodIPServiceDiscovery

974375e

Signed-off-by: Max Wittig <[email protected]>

max-wittig force-pushed the fix/loop-through-model-names branch from 1cdb877 to 136bd7c Compare September 8, 2025 07:45

max-wittig marked this pull request as ready for review September 8, 2025 09:39

refactor: add model_types for K8sServiceNameServiceDiscovery

196ea0f

Signed-off-by: Max Wittig <[email protected]>

max-wittig force-pushed the fix/loop-through-model-names branch from 136bd7c to 196ea0f Compare September 8, 2025 10:48

bufferoverflow approved these changes Sep 8, 2025

View reviewed changes

max-wittig closed this Sep 10, 2025

max-wittig mentioned this pull request Sep 12, 2025

[Bugfix][Router]: loop through model_names #694

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix][Router]: loop through model_names #681

[Bugfix][Router]: loop through model_names #681

Uh oh!

max-wittig commented Sep 4, 2025 •

edited

Loading

Uh oh!

max-wittig Sep 8, 2025

Uh oh!

max-wittig commented Sep 8, 2025

Uh oh!

YuhanLiu11 commented Sep 9, 2025

Uh oh!

max-wittig commented Sep 10, 2025

Uh oh!

max-wittig commented Sep 10, 2025

Uh oh!

YuhanLiu11 commented Sep 10, 2025

Uh oh!

Uh oh!

[Bugfix][Router]: loop through model_names #681

[Bugfix][Router]: loop through model_names #681

Uh oh!

Conversation

max-wittig commented Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Title and Classification

Code Quality

DCO and Signed-off-by

What to Expect for the Reviews

Uh oh!

max-wittig Sep 8, 2025

Choose a reason for hiding this comment

Uh oh!

max-wittig commented Sep 8, 2025

Uh oh!

YuhanLiu11 commented Sep 9, 2025

Uh oh!

max-wittig commented Sep 10, 2025

Uh oh!

max-wittig commented Sep 10, 2025

Uh oh!

YuhanLiu11 commented Sep 10, 2025

Uh oh!

Uh oh!

max-wittig commented Sep 4, 2025 •

edited

Loading