Replies: 6 comments 4 replies
-
watch this: #8515 |
Beta Was this translation helpful? Give feedback.
-
@mmoody-vv I'm having a hard time to get it running with quantization, can you share the full parameter list you are using? |
Beta Was this translation helpful? Give feedback.
-
When passing in the template , I get a different error: |
Beta Was this translation helpful? Give feedback.
-
you need --tokenizer-mode mistral |
Beta Was this translation helpful? Give feedback.
-
@mmoody-vv got it working with casperhansen/mistral-small-24b-instruct-2501-awq docker run \
--runtime nvidia \
-e VLLM_USE_V1=1 \
--ipc=host \
-p "${MODEL_PORT}:8000" \
--env "HUGGING_FACE_HUB_TOKEN=${HUGGING_FACE_HUB_TOKEN}" \
--env "HF_HUB_OFFLINE=1" \
-v "${HF_HOME}:/root/.cache/huggingface" \
vllm/vllm-openai:latest \
--model casperhansen/mistral-small-24b-instruct-2501-awq \
--enforce-eager \
--tool-call-parser mistral \
--enable-auto-tool-choice \
--tokenizer-mode mistral |
Beta Was this translation helpful? Give feedback.
-
i think the problem is the awq version doesn't come with the tokenizer |
Beta Was this translation helpful? Give feedback.
-
I'm trying to use Tool Calling with stelterlab/Mistral-Small-24B-Instruct-2501-AWQ. I've started the model with
--enable-auto-tool-choice --tool-call-parser mistral --chat-template /opt/vllm/app/templates/tool_chat_template_mistral_parallel.jinja
. As a test, I tried running the example code here: https://docs.mistral.ai/capabilities/function_calling/I'm getting this as a result:
ChatCompletion(id='chatcmpl', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='[{"name": "retrieve_payment_status", "arguments": {"transaction_id": "T1001"}}]', refusal=None, role='assistant', audio=None, function_call=None, tool_calls=[], reasoning_content=None), stop_reason=None)], created=1738701730, model='stelterlab/Mistral-Small-24B-Instruct-2501-AWQ', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=27, prompt_tokens=266, total_tokens=293, completion_tokens_details=None, prompt_tokens_details=None), prompt_logprobs=None)
The tool call is going under message, not in tool_calls. I looked through the vLLM code, and I think what's happening is the [TOOL CALLS] tag isn't being applied for some reason, so the parser isn't parsing.
I'm looking for help here. If I'm right, is there something easy that can fix that? If I'm wrong, any ideas on what is happening? Thanks in advance for any help you can offer.
Beta Was this translation helpful? Give feedback.
All reactions