Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thinking/Reasoning Support #76

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

Thinking/Reasoning Support #76

wants to merge 4 commits into from

Conversation

tghamm
Copy link
Owner

@tghamm tghamm commented Feb 25, 2025

No description provided.

@tghamm
Copy link
Owner Author

tghamm commented Feb 25, 2025

Hi @stephentoub I've hit a wall I can't seem to move past in MEAI when attempting to enable support for Anthropic's Sonnet 3.7 reasoning model. TLDR; everything seems to work in the updated IChatClient implementation EXCEPT streaming function calling (with thinking). It seems that even though I'm adding a derived AIContent type I created called ThinkingContent to the first turn of the function invocation flow, it's being stripped away by the time it's round-tripped. Either that or I'm missing something totally obvious and just can't see it :) But the result is a failed auto function call with the code in this branch on a test method called TestStreamingThinkingFunctionCalls(). I dug into MEAI, and I THINK I can see why; even though I'm adding my own AIContent, only the FunctionCallContents are being added to the Assistant message:
https://github.com/dotnet/extensions/blob/576fa221f57f80423a4852e446cee2aa2e937aab/src/Libraries/Microsoft.Extensions.AI/ChatCompletion/FunctionInvokingChatClient.cs#L414

I cry uncle on this one, my rubber ducky has failed me. Any hints? :)

@stephentoub
Copy link
Contributor

I dug into MEAI, and I THINK I can see why; even though I'm adding my own AIContent, only the FunctionCallContents are being added to the Assistant message

In general for streaming, the intent is that the caller adds the yielded updates to the history, e.g.

List<ChatResponseUpdate> updates = [];
await foreach (var update in client.GetStreamingResponseAsync(messages, options))
{
    Console.Write(update);
    updates.Add(update);
}
messages.Add(updates.ToChatResponse().Message);

so if your ThinkingContent is being yielded back, it would be added to the history along with the rest of the content.

The FunctionInvokingChatClient special-cases FunctionCallContent, though, for the reason outlined here:
https://github.com/dotnet/extensions/blob/e8ee4d1e2607fd8361b9cde1823cc459dbc20334/src/Libraries/Microsoft.Extensions.AI/ChatCompletion/FunctionInvokingChatClient.cs#L344-L351
It also needs to send the FunctionCallContent and corresponding FunctionResultContent back to the inner client, so it does add those into the history rather than yielding them.

This does mean that the other content that may have come down as part of the same turn that produced the FunctionCallContent is not sent back to the inner client as part of sending back the function results. Is it necessary that this thinking content be sent back in order to make function calling work? Addressing that is covered by:
dotnet/extensions#5675
we just haven't prioritized it because it's not been clear how impactful it actually is.

dotnet/extensions#5909 also covers some other possible tweaks we could make to how this works.

@tghamm
Copy link
Owner Author

tghamm commented Feb 26, 2025

Thank you for the wealth of information @stephentoub .
It seems like this is something you've already considered with dotnet/extensions#5675. Apologies, I should have done a deeper search on the issues.

To answer your question, yes, when 3.7 Sonnet is in "Extended Thinking Mode", all Assistant turns are supposed to retain either ThinkingContent or RedactedThinkingContent blocks returned from Anthropic. It's a hard-fast requirement for Assistant turns using Tools.
Here's a pinpoint reference to the documentation from Anthropic if helpful:
https://docs.anthropic.com/en/docs/build-with-claude/extended-thinking#preserving-thinking-blocks

My initial thought was that this would primarily affect SK consumers (because despite this being a a reasoning model, it's a hybrid reasoning model, where you can set the amount of reasoning very low and still get VERY fast responses, so may still be very well applicable to a chat/streaming scenario where tools are involved). Then again, it being a hybrid model means you can simply turn off reasoning if you want to use 3.7 Sonnet and have the failing test below succeed so I can see why MSFT would want to weigh the priority of any changes.

Essentially, streaming conversations work with Extended Thinking (using the helper code you supplied) while streaming, e.g. the following test passes:

[TestMethod]
public async Task TestThinkingStreamingConversation()
{
    IChatClient client = new AnthropicClient().Messages;

    List<ChatMessage> messages = new()
    {
        new ChatMessage(ChatRole.User, "How many r's are in the word strawberry?")
    };

    ChatOptions options = new()
    {
        ModelId = AnthropicModels.Claude37Sonnet,
        MaxOutputTokens = 20000,
        Temperature = 1.0f,
        AdditionalProperties = new()
        {
            {nameof(MessageParameters.Thinking), new ThinkingParameters()
            {
                BudgetTokens = 16000
            }}
        }
    };

    List<ChatResponseUpdate> updates  = new();
    StringBuilder sb = new();
    await foreach (var res in client.GetStreamingResponseAsync(messages, options))
    {
        updates.Add(res);
        sb.Append(res);
    }

    Assert.IsTrue(sb.ToString().Contains("3") is true, sb.ToString());

    messages.Add(updates.ToChatResponse().Message);

    Assert.IsTrue(messages.Last().Contents.OfType<Extensions.MEAI.ThinkingContent>().Any());

    messages.Add(new ChatMessage(ChatRole.User, "and how many letters total?"));

    updates.Clear();
    await foreach (var res in client.GetStreamingResponseAsync(messages, options))
    {
        updates.Add(res);
    }
    var text = string.Join("",
        updates.SelectMany(p => p.Contents.OfType<TextContent>()).Select(p => p.Text));

    Assert.IsTrue(text.Contains("10") is true, text);
}

And the following test fails:

[TestMethod]
public async Task TestStreamingThinkingFunctionCalls()
{
    IChatClient client = new AnthropicClient().Messages
        .AsBuilder()
        .UseFunctionInvocation()
        .Build();

    ChatOptions options = new()
    {
        ModelId = AnthropicModels.Claude37Sonnet,
        MaxOutputTokens = 20000,
        Tools = [AIFunctionFactory.Create((string personName) => personName switch {
            "Alice" => "25",
            _ => "40"
        }, "GetPersonAge", "Gets the age of the person whose name is specified.")],
        AdditionalProperties = new()
        {
            {nameof(MessageParameters.Thinking), new ThinkingParameters()
            {
                BudgetTokens = 16000
            }}
        }
    };

    StringBuilder sb = new();
    await foreach (var update in client.GetStreamingResponseAsync("How old is Alice?", options))
    {
        sb.Append(update);
    }

    Assert.IsTrue(
        sb.ToString().Contains("25") is true,
        sb.ToString());
}

I'll plan to move ahead without support for this scenario for the time being so I can get something out there that supports the vast majority of cases, but would welcome it's inclusion in the future, and if there's anything I can do to help facilitate, please let me know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants