use guidellm to test vllm benchmark (with litellm proxy), failed #312
-
try to use guidellm to test vllm benchmark (with litellm proxy), failed 1 setup vllm service 2 setup litellm proxy litellm_config.yaml
litellm --config config.yaml --port 4000 3 use curl to access vllm service and litellm proxy directly , both works fine. 4 run guidellm , failed guidellm benchmark --target "http://localhost:4000" --rate-type sweep --max-seconds 30 --data "prompt_tokens=5,output_tokens=2"
|
Beta Was this translation helpful? Give feedback.
Replies: 9 comments 1 reply
-
try to ask RunLLM, the answer is not helpful. |
Beta Was this translation helpful? Give feedback.
-
Hi |
Beta Was this translation helpful? Give feedback.
-
get detailed debug logs from litellm
|
Beta Was this translation helpful? Give feedback.
-
not sure where "max_completion_tokens" comes from? all the request setup , there is no "max_completion_tokens" . |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
litellm supports max_completion_tokens - try adding |
Beta Was this translation helpful? Give feedback.
-
![]() Yeah due to this mess we have to emit both See also #210 (comment) |
Beta Was this translation helpful? Give feedback.
-
curl to litellm with different parameter, get error: curl http://localhost:4000/v1/completions -H "Content-Type: application/json" -d '{"model": "vllm-model-group-1", "prompt": "Test connection", "max_tokens": 2}' curl http://localhost:4000/v1/completions -H "Content-Type: application/json" -d '{"model": "vllm-model-group-1", "prompt": "Test connection", "max_completion_tokens": 2}' |
Beta Was this translation helpful? Give feedback.
-
refer this doc to setup config.yaml to ignore the parameters, still get the same error. |
Beta Was this translation helpful? Give feedback.
litellm supports max_completion_tokens - try adding
litellm.drop_params=True
ref: https://docs.litellm.ai/docs/completion/input