-
Notifications
You must be signed in to change notification settings - Fork 11.7k
Prefilling assistant message in openai compatible API #13174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Just a heads-up that this is potentially a very breaking change, especially because this is an OpenAI compatible API but this is not OpenAI's behavior. The main situation I can think of is if someone wants to generate a new assistant message after the last one - i.e for ChatML they want the I'd suggest we add this to #9291 at a minimum. |
A better alternative would be to use an additional There is also this issue about a prefix API. I think there is an issue with token healing. |
The feature is aligned with the claude api and the open-webui client. Using |
That is because the Claude API is strictly worse than the Mistral API. You can't even tell whether the Claude API is broken without inspecting the output and you can't shut it off if you don't want that behavior. |
I believe
I believe those clients would still allow adding custom |
I am not aware of clients that support An alternative implementation is For reference, here is an example code that shows how to use both options:
|
That sounds good, I'd very much vote for this being changed to a field in the body rather than default. 😄 |
This adds support for prefilling assistant response (or its thought process) using the OpenAI compatible API.
The feature is used for example by Claude.
It can be tested using open-webui or with the following curl command:
Example advanced scenario: time limit for the thinking process
</think>
to its partial response