Skip to content

Conversation

max-wittig
Copy link
Contributor

@max-wittig max-wittig commented Sep 4, 2025

Fixes this crash. model_name does not exist:

[2025-09-04 15:47:55,962] DEBUG: ==== Enter audio_transcriptions ==== (request.py:542:vllm_router.services.request_service.request)
[2025-09-04 15:47:55,962] DEBUG: Received upload: example.m4a (audio/mp4) (request.py:543:vllm_router.services.request_service.request)
[2025-09-04 15:47:55,962] DEBUG: Params: model=whisper-large-v3-turbo prompt=None response_format='json' temperature=None language=en (request.py:544:vllm_router.services.request_service.request)
[2025-09-04 15:47:55,963] DEBUG: ==== Total endpoints ==== (request.py:568:vllm_router.services.request_service.request)
[2025-09-04 15:47:55,963] DEBUG: [EndpointInfo(url='https://example-01.server.com/bge-m3', model_names=['bge-m3'], Id='dae569d7-58b3-458e-b98a-4607dedd83dd', added_timestamp=1756913005, model_label='default', sleep=False, pod_name=None, service_name=None, namespace=None, model_info={'bge-m3': ModelInfo(id='bge-m3', object='model', created=1757000875, owned_by='vllm', root=None, parent=None, is_adapter=False)}), EndpointInfo(url='https://example-02.server.com/bge-m3', model_names=['bge-m3'], Id='e4182761-a362-4cc0-b847-6d83b7c24eb3', added_timestamp=1756913005, model_label='default', sleep=False, pod_name=None, service_name=None, namespace=None, model_info={'bge-m3': ModelInfo(id='bge-m3', object='model', created=1757000875, owned_by='vllm', root=None, parent=None, is_adapter=False)}), EndpointInfo(url='https://example-01.server.com/bge-reranker-v2-m3', model_names=['bge-reranker-v2-m3'], Id='e6a0a739-8477-4bba-a9e1-dbbf9789b59f', added_timestamp=1756913005, model_label='default', sleep=False, pod_name=None, service_name=None, namespace=None, model_info={'bge-reranker-v2-m3': ModelInfo(id='bge-reranker-v2-m3', object='model', created=1757000875, owned_by='vllm', root=None, parent=None, is_adapter=False)}), EndpointInfo(url='https://example-02.server.com/bge-reranker-v2-m3', model_names=['bge-reranker-v2-m3'], Id='9cf450d6-4ed5-4782-9e8f-aae50de5d197', added_timestamp=1756913005, model_label='default', sleep=False, pod_name=None, service_name=None, namespace=None, model_info={'bge-reranker-v2-m3': ModelInfo(id='bge-reranker-v2-m3', object='model', created=1757000875, owned_by='vllm', root=None, parent=None, is_adapter=False)}), EndpointInfo(url='https://example-03.server.com/deepseek-r1', model_names=['deepseek-r1-distill-qwen-7b'], Id='9ef9cf7c-572e-4968-88a7-f0c7dad3c266', added_timestamp=1756913005, model_label='default', sleep=False, pod_name=None, service_name=None, namespace=None, model_info={'deepseek-r1-distill-qwen-7b': ModelInfo(id='deepseek-r1-distill-qwen-7b', object='model', created=1757000875, owned_by='vllm', root=None, parent=None, is_adapter=False)}), EndpointInfo(url='https://example-04.server.com/deepseek-r1', model_names=['deepseek-r1-distill-qwen-7b'], Id='d5fd0bcd-d7ee-4f0b-9baf-6583bdab80db', added_timestamp=1756913005, model_label='default', sleep=False, pod_name=None, service_name=None, namespace=None, model_info={'deepseek-r1-distill-qwen-7b': ModelInfo(id='deepseek-r1-distill-qwen-7b', object='model', created=1757000875, owned_by='vllm', root=None, parent=None, is_adapter=False)}), EndpointInfo(url='https://example-01.server.com/deepseek-r1-0528-qwen3-8b', model_names=['deepseek-r1-0528-qwen3-8b'], Id='13b385eb-dfda-4a60-a2ea-7fb5cde0faf3', added_timestamp=1756913005, model_label='default', sleep=False, pod_name=None, service_name=None, namespace=None, model_info={'deepseek-r1-0528-qwen3-8b': ModelInfo(id='deepseek-r1-0528-qwen3-8b', object='model', created=1757000875, owned_by='vllm', root=None, parent=None, is_adapter=False)}), EndpointInfo(url='https://example-02.server.com/deepseek-r1-0528-qwen3-8b', model_names=['deepseek-r1-0528-qwen3-8b'], Id='3b2a5ed7-3ee2-4f93-8f22-12f8f24be51c', added_timestamp=1756913005, model_label='default', sleep=False, pod_name=None, service_name=None, namespace=None, model_info={'deepseek-r1-0528-qwen3-8b': ModelInfo(id='deepseek-r1-0528-qwen3-8b', object='model', created=1757000875, owned_by='vllm', root=None, parent=None, is_adapter=False)}), EndpointInfo(url='https://example-03.server.com/devstral-small-2505', model_names=['devstral-small-2505'], Id='515e98ca-4ffa-41aa-bf4c-deccdebb87d4', added_timestamp=1756913005, model_label='default', sleep=False, pod_name=None, service_name=None, namespace=None, model_info={'devstral-small-2505': ModelInfo(id='devstral-small-2505', object='model', created=1757000875, owned_by='vllm', root=None, parent=None, is_adapter=False)}), EndpointInfo(url='https://example-04.server.com/devstral-small-2505', model_names=['devstral-small-2505'], Id='fdee8b1c-1457-4e44-af03-ee32fa69b328', added_timestamp=1756913005, model_label='default', sleep=False, pod_name=None, service_name=None, namespace=None, model_info={'devstral-small-2505': ModelInfo(id='devstral-small-2505', object='model', created=1757000875, owned_by='vllm', root=None, parent=None, is_adapter=False)}), EndpointInfo(url='https://example-03.server.com/gpt-oss-120b', model_names=['gpt-oss-120b'], Id='363f9a01-f9e7-42dd-afd2-10b988568f4e', added_timestamp=1756913005, model_label='default', sleep=False, pod_name=None, service_name=None, namespace=None, model_info={'gpt-oss-120b': ModelInfo(id='gpt-oss-120b', object='model', created=1757000875, owned_by='vllm', root=None, parent=None, is_adapter=False)}), EndpointInfo(url='https://example-04.server.com/gpt-oss-120b', model_names=['gpt-oss-120b'], Id='17b1749b-bb09-4a2d-9280-22f54abfc7d5', added_timestamp=1756913005, model_label='default', sleep=False, pod_name=None, service_name=None, namespace=None, model_info={'gpt-oss-120b': ModelInfo(id='gpt-oss-120b', object='model', created=1757000875, owned_by='vllm', root=None, parent=None, is_adapter=False)}), EndpointInfo(url='https://example-01.server.com/mistral-7b-instruct', model_names=['mistral-7b-instruct'], Id='e978f9f2-9da7-4971-a20a-0d4d9f833eae', added_timestamp=1756913005, model_label='default', sleep=False, pod_name=None, service_name=None, namespace=None, model_info={'mistral-7b-instruct': ModelInfo(id='mistral-7b-instruct', object='model', created=1757000875, owned_by='vllm', root=None, parent=None, is_adapter=False)}), EndpointInfo(url='https://example-02.server.com/mistral-7b-instruct', model_names=['mistral-7b-instruct'], Id='6c990bf2-2a02-49bf-b58c-debd6e56ace3', added_timestamp=1756913005, model_label='default', sleep=False, pod_name=None, service_name=None, namespace=None, model_info={'mistral-7b-instruct': ModelInfo(id='mistral-7b-instruct', object='model', created=1757000875, owned_by='vllm', root=None, parent=None, is_adapter=False)}), EndpointInfo(url='https://example-03.server.com/starcoder', model_names=['starcoder2-3b'], Id='06353f55-2fd9-43b3-ac30-8e52437265d8', added_timestamp=1756913005, model_label='default', sleep=False, pod_name=None, service_name=None, namespace=None, model_info={'starcoder2-3b': ModelInfo(id='starcoder2-3b', object='model', created=1757000875, owned_by='vllm', root=None, parent=None, is_adapter=False)}), EndpointInfo(url='https://example-04.server.com/starcoder', model_names=['starcoder2-3b'], Id='dd00fd64-0579-4f55-b508-5d7847955931', added_timestamp=1756913005, model_label='default', sleep=False, pod_name=None, service_name=None, namespace=None, model_info={'starcoder2-3b': ModelInfo(id='starcoder2-3b', object='model', created=1757000875, owned_by='vllm', root=None, parent=None, is_adapter=False)}), EndpointInfo(url='https://example-03.server.com/qwen3-30b-a3b', model_names=['qwen3-30b-a3b'], Id='d4fb240e-e067-4751-a718-a8ef466d2609', added_timestamp=1756913005, model_label='default', sleep=False, pod_name=None, service_name=None, namespace=None, model_info={'qwen3-30b-a3b': ModelInfo(id='qwen3-30b-a3b', object='model', created=1757000875, owned_by='vllm', root=None, parent=None, is_adapter=False)}), EndpointInfo(url='https://example-04.server.com/qwen3-30b-a3b', model_names=['qwen3-30b-a3b'], Id='84a62692-95f1-4371-8b4f-d1d4e7d621bc', added_timestamp=1756913005, model_label='default', sleep=False, pod_name=None, service_name=None, namespace=None, model_info={'qwen3-30b-a3b': ModelInfo(id='qwen3-30b-a3b', object='model', created=1757000875, owned_by='vllm', root=None, parent=None, is_adapter=False)}), EndpointInfo(url='https://example-03.server.com/qwen3-coder-30b-a3b-instruct', model_names=['qwen3-coder-30b-a3b-instruct'], Id='09062080-585a-44d3-95c0-48b8998f6b19', added_timestamp=1756913005, model_label='default', sleep=False, pod_name=None, service_name=None, namespace=None, model_info={'qwen3-coder-30b-a3b-instruct': ModelInfo(id='qwen3-coder-30b-a3b-instruct', object='model', created=1757000875, owned_by='vllm', root=None, parent=None, is_adapter=False)}), EndpointInfo(url='https://example-04.server.com/qwen3-coder-30b-a3b-instruct', model_names=['qwen3-coder-30b-a3b-instruct'], Id='5d6a1068-04b9-418d-8bf4-8a62ff67b745', added_timestamp=1756913005, model_label='default', sleep=False, pod_name=None, service_name=None, namespace=None, model_info={'qwen3-coder-30b-a3b-instruct': ModelInfo(id='qwen3-coder-30b-a3b-instruct', object='model', created=1757000875, owned_by='vllm', root=None, parent=None, is_adapter=False)}), EndpointInfo(url='https://example-01.server.com/poc-1', model_names=['llama-3.1-8b-instruct'], Id='b204bbe0-ef56-4fe8-892a-e8a251e57e25', added_timestamp=1756913005, model_label='default', sleep=False, pod_name=None, service_name=None, namespace=None, model_info={'llama-3.1-8b-instruct': ModelInfo(id='llama-3.1-8b-instruct', object='model', created=1757000875, owned_by='vllm', root=None, parent=None, is_adapter=False)}), EndpointInfo(url='https://example-02.server.com/poc-2', model_names=['mistral-nemo-instruct-2407'], Id='22c4c983-3270-4d63-b833-e1b0f6c8d0d1', added_timestamp=1756913005, model_label='default', sleep=False, pod_name=None, service_name=None, namespace=None, model_info={'mistral-nemo-instruct-2407': ModelInfo(id='mistral-nemo-instruct-2407', object='model', created=1757000875, owned_by='vllm', root=None, parent=None, is_adapter=False)}), EndpointInfo(url='https://example-02.server.com/poc-1', model_names=['pixtral-12b-2409'], Id='ab944f1d-8929-4570-ac58-10e4a056cf96', added_timestamp=1756913005, model_label='default', sleep=False, pod_name=None, service_name=None, namespace=None, model_info={'pixtral-12b-2409': ModelInfo(id='pixtral-12b-2409', object='model', created=1757000875, owned_by='vllm', root=None, parent=None, is_adapter=False)}), EndpointInfo(url='https://example-02.server.com/poc-3', model_names=['qwen2.5-coder-7b'], Id='63dcbaef-6557-40ce-a9fd-398312fdf4d1', added_timestamp=1756913005, model_label='default', sleep=False, pod_name=None, service_name=None, namespace=None, model_info={'qwen2.5-coder-7b': ModelInfo(id='qwen2.5-coder-7b', object='model', created=1757000875, owned_by='vllm', root=None, parent=None, is_adapter=False)}), EndpointInfo(url='https://itips-llm-int.openshift.siemens.com', model_names=['Mistral-Small-24B-Instruct-2501-FP8-dynamic'], Id='5e4c6547-55c6-4982-8fb1-84b1eb675a3d', added_timestamp=1756913005, model_label='default', sleep=False, pod_name=None, service_name=None, namespace=None, model_info={'Mistral-Small-24B-Instruct-2501-FP8-dynamic': ModelInfo(id='Mistral-Small-24B-Instruct-2501-FP8-dynamic', object='model', created=1757000875, owned_by='vllm', root=None, parent=None, is_adapter=False)}), EndpointInfo(url='https://example-05.server.com/codestral-2508', model_names=['codestral-2508'], Id='1d5b9484-0de3-4aeb-9741-378904f833ec', added_timestamp=1756913005, model_label='default', sleep=False, pod_name=None, service_name=None, namespace=None, model_info={'codestral-2508': ModelInfo(id='codestral-2508', object='model', created=1757000875, owned_by='vllm', root=None, parent=None, is_adapter=False)}), EndpointInfo(url='https://example-06.server.com', model_names=['codestral-2508'], Id='469716a6-e123-42ad-89cd-a6d663a9dfa7', added_timestamp=1756913005, model_label='default', sleep=False, pod_name=None, service_name=None, namespace=None, model_info={'codestral-2508': ModelInfo(id='codestral-2508', object='model', created=1757000875, owned_by='vllm', root=None, parent=None, is_adapter=False)}), EndpointInfo(url='https://example-06.server.com', model_names=['devstral-medium-2507'], Id='7ae2f424-c8d5-47f6-963d-53592459848f', added_timestamp=1756913005, model_label='default', sleep=False, pod_name=None, service_name=None, namespace=None, model_info={'devstral-medium-2507': ModelInfo(id='devstral-medium-2507', object='model', created=1757000875, owned_by='vllm', root=None, parent=None, is_adapter=False)}), EndpointInfo(url='https://example-01.server.com/whisper-large-v3-turbo', model_names=['whisper-large-v3-turbo'], Id='b9b85b73-783e-4d8d-b22f-c56ba006388d', added_timestamp=1756913005, model_label='default', sleep=False, pod_name=None, service_name=None, namespace=None, model_info={'whisper-large-v3-turbo': ModelInfo(id='whisper-large-v3-turbo', object='model', created=1757000875, owned_by='vllm', root=None, parent=None, is_adapter=False)})] (request.py:569:vllm_router.services.request_service.request)
[2025-09-04 15:47:55,963] DEBUG: ==== Total endpoints ==== (request.py:570:vllm_router.services.request_service.request)
INFO:     172.16.0.41:51278 - "POST /v1/audio/transcriptions HTTP/1.1" 500 Internal Server Error
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/opt/venv/lib/python3.12/site-packages/uvicorn/protocols/http/h11_impl.py", line 403, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/uvicorn/middleware/proxy_headers.py", line 60, in __call__
    return await self.app(scope, receive, send)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/fastapi/applications.py", line 1054, in __call__
    await super().__call__(scope, receive, send)
  File "/opt/venv/lib/python3.12/site-packages/sentry_sdk/integrations/starlette.py", line 409, in _sentry_patched_asgi_app
    return await middleware(scope, receive, send)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/sentry_sdk/integrations/asgi.py", line 158, in _run_asgi3
    return await self._run_app(scope, receive, send, asgi_version=3)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/sentry_sdk/integrations/asgi.py", line 260, in _run_app
    raise exc from None
  File "/opt/venv/lib/python3.12/site-packages/sentry_sdk/integrations/asgi.py", line 255, in _run_app
    return await self.app(
           ^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/starlette/applications.py", line 112, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/opt/venv/lib/python3.12/site-packages/sentry_sdk/integrations/starlette.py", line 200, in _create_span_call
    return await old_call(app, scope, new_receive, new_send, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/starlette/middleware/errors.py", line 187, in __call__
    raise exc
  File "/opt/venv/lib/python3.12/site-packages/starlette/middleware/errors.py", line 165, in __call__
    await self.app(scope, receive, _send)
  File "/opt/venv/lib/python3.12/site-packages/sentry_sdk/integrations/starlette.py", line 298, in _sentry_exceptionmiddleware_call
    await old_call(self, scope, receive, send)
  File "/opt/venv/lib/python3.12/site-packages/vllm_router/services/request_service/request.py", line 576, in route_general_transcriptions
  if model == ep.model_name
              ^^^^^^^^^^^^^
  AttributeError: 'EndpointInfo' object has no attribute 'model_name'. Did you mean: 'model_names'?

Tested on our dev system:

image
  • Make sure the code changes pass the pre-commit checks.
  • Sign-off your commit by using -s when doing git commit
  • Try to classify PRs for easy understanding of the type of changes, such as [Bugfix], [Feat], and [CI].
Detailed Checklist (Click to Expand)

Thank you for your contribution to production-stack! Before submitting the pull request, please ensure the PR meets the following criteria. This helps us maintain the code quality and improve the efficiency of the review process.

PR Title and Classification

Please try to classify PRs for easy understanding of the type of changes. The PR title is prefixed appropriately to indicate the type of change. Please use one of the following:

  • [Bugfix] for bug fixes.
  • [CI/Build] for build or continuous integration improvements.
  • [Doc] for documentation fixes and improvements.
  • [Feat] for new features in the cluster (e.g., autoscaling, disaggregated prefill, etc.).
  • [Router] for changes to the vllm_router (e.g., routing algorithm, router observability, etc.).
  • [Misc] for PRs that do not fit the above categories. Please use this sparingly.

Note: If the PR spans more than one category, please include all relevant prefixes.

Code Quality

The PR need to meet the following code quality standards:

  • Pass all linter checks. Please use pre-commit to format your code. See README.md for installation.
  • The code need to be well-documented to ensure future contributors can easily understand the code.
  • Please include sufficient tests to ensure the change is stay correct and robust. This includes both unit tests and integration tests.

DCO and Signed-off-by

When contributing changes to this project, you must agree to the DCO. Commits must include a Signed-off-by: header which certifies agreement with the terms of the DCO.

Using -s with git commit will automatically add this header.

What to Expect for the Reviews

We aim to address all PRs in a timely manner. If no one reviews your PR within 5 days, please @-mention one of YuhanLiu11
, Shaoting-Feng or ApostaC.

@max-wittig max-wittig force-pushed the fix/loop-through-model-names branch 9 times, most recently from 89aa6f9 to 4c9eeda Compare September 8, 2025 06:27
Comment on lines -568 to -570
logger.debug("==== Total endpoints ====")
logger.debug(endpoints)
logger.debug("==== Total endpoints ====")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This MR also removes a lot of excessive logging, while being more transparent about errors

@max-wittig max-wittig force-pushed the fix/loop-through-model-names branch 2 times, most recently from f3727d2 to 4da5ad3 Compare September 8, 2025 06:30
@max-wittig max-wittig changed the title fix(request-audio): loop through model_names [Bugfix][Router]: loop through model_names Sep 8, 2025
@max-wittig max-wittig force-pushed the fix/loop-through-model-names branch 3 times, most recently from 633bcd0 to 9472fd0 Compare September 8, 2025 06:41
@max-wittig max-wittig force-pushed the fix/loop-through-model-names branch from 9472fd0 to 8e02e5c Compare September 8, 2025 06:45
@max-wittig max-wittig force-pushed the fix/loop-through-model-names branch from 1cdb877 to 136bd7c Compare September 8, 2025 07:45
@max-wittig max-wittig marked this pull request as ready for review September 8, 2025 09:39
@max-wittig max-wittig force-pushed the fix/loop-through-model-names branch from 136bd7c to 196ea0f Compare September 8, 2025 10:48
@max-wittig
Copy link
Contributor Author

@YuhanLiu11 We have this running in our dev environment. Let me know what you think about this bugfix.

@YuhanLiu11
Copy link
Collaborator

@max-wittig hey I understand that we need to modify route_general_transcriptions to fix this bug, but why do we need to modify service_discovery.py for this bug fix?

@max-wittig
Copy link
Contributor Author

@YuhanLiu11 Because we need to use static-model-types as the current functionality is using static-model-labels which is wrong.

It modifies service_discovery as tests were failing in the k8 routing functionality. Let me know, if I should split this up more.

@max-wittig
Copy link
Contributor Author

Split up into the first part: #694

@max-wittig max-wittig closed this Sep 10, 2025
@YuhanLiu11
Copy link
Collaborator

Hey @max-wittig yes, splitting it to two PRs sounds good! Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants