Skip to content

Conversation

yf-yang
Copy link

@yf-yang yf-yang commented Sep 6, 2025

Closes #2778

@yf-yang
Copy link
Author

yf-yang commented Sep 6, 2025

@DouweM

@DouweM DouweM self-assigned this Sep 8, 2025
Copy link
Collaborator

@DouweM DouweM left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@yf-yang
Copy link
Author

yf-yang commented Sep 9, 2025

Sure, will get back to you in one or two days

@yf-yang
Copy link
Author

yf-yang commented Sep 11, 2025

Hmmm, is this test failure relevant?
Also, I am not sure if the coverage is enough as I don't have a deepseek/openrouter key

@yf-yang yf-yang requested a review from DouweM September 11, 2025 17:03
@DouweM
Copy link
Collaborator

DouweM commented Sep 11, 2025

@yf-yang The test failure indicate that 'response_prefix': None should be added to the output's that's compared in that test.

Hmmm, is this test failure relevant? Also, I am not sure if the coverage is enough as I don't have a deepseek/openrouter key

I have both keys so I can look at that today or tomorrow. Thanks for all your work here!

@DouweM
Copy link
Collaborator

DouweM commented Sep 15, 2025

@yf-yang I'll have a look here tomorrow, sorry for the delay!

# Conflicts:
#	pydantic_ai_slim/pydantic_ai/_agent_graph.py
#	tests/test_agent.py
@DouweM
Copy link
Collaborator

DouweM commented Sep 16, 2025

@yf-yang I played around with this for a bit and made some tweaks (and introduced a new issue 🙈 ), and while it works well for Anthropic I have a few doubts about the design.

We could make this work for DeepSeek, OpenRouter and Mistral as well, but out of the major providers only Anthropic supports it (and they may drop it at some point as it makes it a lot easier to jailbreak models, which is presumably why OpenAI and Google don't have it), so I don't love having response_prefix as a top-level field on run etc for such a niche feature.

Ideally, the field could live on AnthropicModelSettings as anthropic_assistant_prefill, but those are passed to every model request in an agent run, while we really only want it on the initial request. But there's not currently a way for the model class methods to determine what step of the agent run it's on. It could makes sense to add request_index to ModelRequestParameters (which could also help with #1820), which I'd like better than adding the just-for-Anthropic response_prefix there.

Would you mind refactoring in that direction? We can then drop the openai.py and profile stuff, and make it clear that this is an Anthropic-only feature. If other major models support it at some point, we can then consider introducing a top-level field on ModelSettings.

@yf-yang
Copy link
Author

yf-yang commented Sep 19, 2025

@DouweM What do you mean by "we really only want it on the initial request"? Is initial request the first call of a multiple tool iteration, or does it mean the first call with no history existing?

Also I'd say sometimes I would like to dynamically control the prefill during agent run. For example, it is a useful strategy to support multiple tool calls similar to PromptOutput by

assistant: LLM generates tool call name
user: generates a tool schema
assistant (better with prefill): <json>args</json>

So I think it is not a good idea to make it a model-wise concept. The lifetime of the prefill request should be similar to user prompt's

@DouweM
Copy link
Collaborator

DouweM commented Sep 19, 2025

What do you mean by "we really only want it on the initial request"? Is initial request the first call of a multiple tool iteration, or does it mean the first call with no history existing?

@yf-yang I was referring to the first request made to the model in an agent run (which could have message history from a previous run), assuming that agent.run(..., response_prefix='...') means that the next thing the model starts generating has to start with that prefix, but if it then calls some tools and we send them back in a request, the response to that request shouldn't necessarily start with that same prefix.

Also I'd say sometimes I would like to dynamically control the prefill during agent run. For example, it is a useful strategy to support multiple tool calls similar to PromptOutput by

assistant: LLM generates tool call name 
user: generates a tool schema 
assistant (better with prefill): `<json>args</json>`

Yeah that's reasonable, and similar to #1820 where we're discussing letting model settings be changed per-request in an agent run with a new prepare_request hook. That would work well with this being a model setting, which you can then set as desired in just the first request or a later request. That solves my problem, but means that this feature doesn't make sense without having that prepare_request hook.

So I think it is not a good idea to make it a model-wise concept. The lifetime of the prefill request should be similar to user prompt's

But the user prompt is only used on the first request generated in UserPromptNode, right? So I think that's not like how you intend for prefill to work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support prefill by ending history with ModelResponse
2 participants