Thinking/Reasoning Support #76

tghamm · 2025-02-25T02:29:41Z

No description provided.

tghamm · 2025-02-25T19:45:01Z

Hi @stephentoub I've hit a wall I can't seem to move past in MEAI when attempting to enable support for Anthropic's Sonnet 3.7 reasoning model. TLDR; everything seems to work in the updated IChatClient implementation EXCEPT streaming function calling (with thinking). It seems that even though I'm adding a derived AIContent type I created called ThinkingContent to the first turn of the function invocation flow, it's being stripped away by the time it's round-tripped. Either that or I'm missing something totally obvious and just can't see it :) But the result is a failed auto function call with the code in this branch on a test method called TestStreamingThinkingFunctionCalls(). I dug into MEAI, and I THINK I can see why; even though I'm adding my own AIContent, only the FunctionCallContents are being added to the Assistant message:
https://github.com/dotnet/extensions/blob/576fa221f57f80423a4852e446cee2aa2e937aab/src/Libraries/Microsoft.Extensions.AI/ChatCompletion/FunctionInvokingChatClient.cs#L414

I cry uncle on this one, my rubber ducky has failed me. Any hints? :)

stephentoub · 2025-02-26T02:46:33Z

I dug into MEAI, and I THINK I can see why; even though I'm adding my own AIContent, only the FunctionCallContents are being added to the Assistant message

In general for streaming, the intent is that the caller adds the yielded updates to the history, e.g.

List<ChatResponseUpdate> updates = [];
await foreach (var update in client.GetStreamingResponseAsync(messages, options))
{
    Console.Write(update);
    updates.Add(update);
}
messages.Add(updates.ToChatResponse().Message);

so if your ThinkingContent is being yielded back, it would be added to the history along with the rest of the content.

The FunctionInvokingChatClient special-cases FunctionCallContent, though, for the reason outlined here:
https://github.com/dotnet/extensions/blob/e8ee4d1e2607fd8361b9cde1823cc459dbc20334/src/Libraries/Microsoft.Extensions.AI/ChatCompletion/FunctionInvokingChatClient.cs#L344-L351
It also needs to send the FunctionCallContent and corresponding FunctionResultContent back to the inner client, so it does add those into the history rather than yielding them.

This does mean that the other content that may have come down as part of the same turn that produced the FunctionCallContent is not sent back to the inner client as part of sending back the function results. Is it necessary that this thinking content be sent back in order to make function calling work? Addressing that is covered by:
dotnet/extensions#5675
we just haven't prioritized it because it's not been clear how impactful it actually is.

dotnet/extensions#5909 also covers some other possible tweaks we could make to how this works.

tghamm · 2025-02-26T11:10:45Z

Thank you for the wealth of information @stephentoub .
It seems like this is something you've already considered with dotnet/extensions#5675. Apologies, I should have done a deeper search on the issues.

To answer your question, yes, when 3.7 Sonnet is in "Extended Thinking Mode", all Assistant turns are supposed to retain either ThinkingContent or RedactedThinkingContent blocks returned from Anthropic. It's a hard-fast requirement for Assistant turns using Tools.
Here's a pinpoint reference to the documentation from Anthropic if helpful:
https://docs.anthropic.com/en/docs/build-with-claude/extended-thinking#preserving-thinking-blocks

My initial thought was that this would primarily affect SK consumers (because despite this being a a reasoning model, it's a hybrid reasoning model, where you can set the amount of reasoning very low and still get VERY fast responses, so may still be very well applicable to a chat/streaming scenario where tools are involved). Then again, it being a hybrid model means you can simply turn off reasoning if you want to use 3.7 Sonnet and have the failing test below succeed so I can see why MSFT would want to weigh the priority of any changes.

Essentially, streaming conversations work with Extended Thinking (using the helper code you supplied) while streaming, e.g. the following test passes:

[TestMethod]
public async Task TestThinkingStreamingConversation()
{
    IChatClient client = new AnthropicClient().Messages;

    List<ChatMessage> messages = new()
    {
        new ChatMessage(ChatRole.User, "How many r's are in the word strawberry?")
    };

    ChatOptions options = new()
    {
        ModelId = AnthropicModels.Claude37Sonnet,
        MaxOutputTokens = 20000,
        Temperature = 1.0f,
        AdditionalProperties = new()
        {
            {nameof(MessageParameters.Thinking), new ThinkingParameters()
            {
                BudgetTokens = 16000
            }}
        }
    };

    List<ChatResponseUpdate> updates  = new();
    StringBuilder sb = new();
    await foreach (var res in client.GetStreamingResponseAsync(messages, options))
    {
        updates.Add(res);
        sb.Append(res);
    }

    Assert.IsTrue(sb.ToString().Contains("3") is true, sb.ToString());

    messages.Add(updates.ToChatResponse().Message);

    Assert.IsTrue(messages.Last().Contents.OfType<Extensions.MEAI.ThinkingContent>().Any());

    messages.Add(new ChatMessage(ChatRole.User, "and how many letters total?"));

    updates.Clear();
    await foreach (var res in client.GetStreamingResponseAsync(messages, options))
    {
        updates.Add(res);
    }
    var text = string.Join("",
        updates.SelectMany(p => p.Contents.OfType<TextContent>()).Select(p => p.Text));

    Assert.IsTrue(text.Contains("10") is true, text);
}

And the following test fails:

[TestMethod]
public async Task TestStreamingThinkingFunctionCalls()
{
    IChatClient client = new AnthropicClient().Messages
        .AsBuilder()
        .UseFunctionInvocation()
        .Build();

    ChatOptions options = new()
    {
        ModelId = AnthropicModels.Claude37Sonnet,
        MaxOutputTokens = 20000,
        Tools = [AIFunctionFactory.Create((string personName) => personName switch {
            "Alice" => "25",
            _ => "40"
        }, "GetPersonAge", "Gets the age of the person whose name is specified.")],
        AdditionalProperties = new()
        {
            {nameof(MessageParameters.Thinking), new ThinkingParameters()
            {
                BudgetTokens = 16000
            }}
        }
    };

    StringBuilder sb = new();
    await foreach (var update in client.GetStreamingResponseAsync("How old is Alice?", options))
    {
        sb.Append(update);
    }

    Assert.IsTrue(
        sb.ToString().Contains("25") is true,
        sb.ToString());
}

I'll plan to move ahead without support for this scenario for the time being so I can get something out there that supports the vast majority of cases, but would welcome it's inclusion in the future, and if there's anything I can do to help facilitate, please let me know!

tghamm added 4 commits February 24, 2025 21:28

very basics of reasoning params and a single test

489d109

non-streaming basically complete, streaming ALMOST there

d63c345

streaming complete...switching to IChatClient

9d55cdf

IChatClient non-streaming done, streaming works unless using tools

7a49669

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Thinking/Reasoning Support #76

Thinking/Reasoning Support #76

tghamm commented Feb 25, 2025

tghamm commented Feb 25, 2025

stephentoub commented Feb 26, 2025

tghamm commented Feb 26, 2025

Thinking/Reasoning Support #76

Are you sure you want to change the base?

Thinking/Reasoning Support #76

Conversation

tghamm commented Feb 25, 2025

tghamm commented Feb 25, 2025

stephentoub commented Feb 26, 2025

tghamm commented Feb 26, 2025