Skip to content

Conversation

willgdjones
Copy link

@willgdjones willgdjones commented Sep 15, 2025

Fixes #1524

⚠️ (Generated by Cursor - in the process of being edited and refined)

Pydantic AI Stream Cancellation

This implementation adds stream cancellation functionality, allowing users to cancel streaming responses when clients disconnect or explicitly request cancellation.

🎯 Problem Solved

Previously, when users broke early from a streaming loop, Pydantic AI would continue consuming the entire response in the background to ensure proper usage tracking. This led to:

  • Wasted Resources: Unnecessary compute and network usage
  • Poor User Experience: No way to stop long-running streams
  • Memory Issues: Streams continuing even after clients disconnect

✨ Features

  • Explicit Cancellation API: Call await stream.cancel() to stop streaming
  • Automatic HTTP Disconnect Handling: Streams cancel when web clients disconnect
  • Partial Usage Tracking: Accurate token counts for cancelled streams
  • Exception Safety: Multiple cancel calls are safe and idempotent
  • OpenAI Support: Initial implementation supports OpenAI models

Basic Usage

import asyncio
from pydantic_ai import Agent
from pydantic_ai.exceptions import StreamCancelled

async def basic_cancellation():
    agent = Agent("openai:gpt-4o-mini")
    
    try:
        async with agent.run_stream("Tell me a long story") as result:
            chunk_count = 0
            async for content in result.stream_text(delta=True):
                print(content)
                chunk_count += 1
                
                # Cancel after 3 chunks
                if chunk_count >= 3:
                    await result.cancel()
                    
    except StreamCancelled as e:
        print(f"Stream cancelled: {e}")
        print(f"Partial usage: {result.usage()}")

asyncio.run(basic_cancellation())

📚 API Reference

AgentStream.cancel()

async def cancel(self) -> None:
    """Cancel the streaming response.
    
    This will close the underlying network connection and cause any active iteration
    over the stream to raise a StreamCancelled exception.
    
    Subsequent calls to cancel() are safe and will not raise additional exceptions.
    """

StreamCancelled Exception

class StreamCancelled(Exception):
    """Exception raised when a streaming response is cancelled."""
    
    def __init__(self, message: str = "Stream was cancelled"):
        self.message = message
        super().__init__(message)

🏗️ Implementation Details

Architecture

The implementation consists of several components:

  1. StreamCancelled Exception (exceptions.py)

    • New exception type for cancelled streams
  2. AgentStream.cancel() (result.py)

    • Public API for cancelling streams
    • Sets internal cancellation flag
    • Delegates to underlying StreamedResponse
  3. StreamedResponse.cancel() (models/__init__.py)

    • Abstract base method for model-specific cancellation
    • Default no-op implementation
  4. OpenAIStreamedResponse.cancel() (models/openai.py)

    • OpenAI-specific cancellation implementation
    • Marks stream as cancelled, causing iterator to raise exception
  5. Cancellation-Aware Iterator (result.py)

    • Checks cancellation flag before yielding events
    • Raises StreamCancelled when cancelled
  6. Agent Graph Updates (_agent_graph.py)

    • Handles StreamCancelled in automatic consumption logic

Usage Tracking

  • Partial Usage: Cancelled streams report accurate token usage up to cancellation point
  • No Double Counting: Usage is accumulated as chunks are processed
  • Metadata: Usage objects can indicate partial/cancelled state

Error Handling

  • Idempotent Cancellation: Multiple cancel() calls are safe
  • Exception Propagation: StreamCancelled bubbles up through iteration
  • Resource Cleanup: Network connections are properly closed

@willgdjones willgdjones force-pushed the feature/halt-streaming branch 3 times, most recently from e7dab8d to 8fda0ab Compare September 15, 2025 08:58
Copy link
Collaborator

@DouweM DouweM left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@willgdjones Thanks Will. I get where Cursor is going with this, but I'm not sure if it's doing too little (shouldn't we be calling openai.AgentStream.close() at some point?) and/or too much (do we need the multiple canceled booleans and exception, if the wrapped stream could just stop yielding events). Would be good to get your (human, not AI!) take :)

"""Exception raised when a streaming response is cancelled."""

def __init__(self, message: str = 'Stream was cancelled'):
self.message = message
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to have a message as it's not used anywhere

This should close the underlying network connection and cause any active iteration
to raise a StreamCancelled exception. The default implementation is a no-op.
"""
pass
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should raise NotImplementedError to not silently keep the stream going when the user thought they canceled it


async def _get_event_iterator(self) -> AsyncIterator[ModelResponseStreamEvent]:
async for chunk in self._response:
# Check for cancellation before processing each chunk
if self._cancelled:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we do this after recording the usage?

@@ -1418,6 +1423,14 @@ def timestamp(self) -> datetime:
"""Get the timestamp of the response."""
return self._timestamp

async def cancel(self) -> None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't need to be async if the recommended behavior is to always just set a flag and then cancel on the next iteration


async def _get_event_iterator(self) -> AsyncIterator[ModelResponseStreamEvent]:
async for chunk in self._response:
# Check for cancellation before processing each chunk
if self._cancelled:
raise StreamCancelled('OpenAI stream was cancelled')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this actually cause OpenAI to cleanly close the stream? Shouldn't we call await AsyncStream.close() or something?

They also have their own # Ensure the entire stream is consumed:

https://github.com/openai/openai-python/blob/4756247cee3d9548397b26a29109e76cc9522379/src/openai/_streaming.py#L216-L222

Note that right now, OpenAIStreamedResponse only has access to AsyncIterable[ChatCompletionChunk], but that's derived from the AsyncStream[ChatCompletionChunk]:

async def _process_streamed_response(
self, response: AsyncStream[ChatCompletionChunk], model_request_parameters: ModelRequestParameters
) -> OpenAIStreamedResponse:
"""Process a streamed response, and prepare a streaming response to return."""
peekable_response = _utils.PeekableAsyncStream(response)

So in order to access that cancel method, we may need to put it on _utils.PeekableAsyncStream as well, and then forward it to the underlying stream.

async for item in stream_response:
# Check for cancellation first
if is_cancelled():
raise exceptions.StreamCancelled()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have to raise it here and inside the StreamedResponse?

Wouldn't the stream_response just stop yielding once it's cancelled and there are no more messages, meaning we shouldn't need to do anythings special here?

self._agent_stream_iterator = _get_usage_checking_stream_response(
self._raw_stream_response, self._usage_limits, self.usage
self._agent_stream_iterator = _get_cancellation_aware_stream_response(
self._raw_stream_response, self._usage_limits, self.usage, lambda: self._cancelled
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to what I wrote below, I don't understand why we need this lambda, instead of just pushing the cancelation down to the wrapped StreamedResponse, and then relying on that to stop yielding events

try:
async for _ in agent_stream:
pass
except exceptions.StreamCancelled:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this exception at all?

Like I wrote below: Wouldn't the StreamedResponse just stop yielding once it's been cancelled and there are no more messages, meaning we shouldn't need to do anything special here?

m = OpenAIChatModel('gpt-4o-mini', provider=OpenAIProvider(openai_client=mock_client))
agent = Agent(m)

async with agent.run_stream('Hello world') as result:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a test that cancels streaming in the middle of an unfinished tool call, like in the example at https://ai.pydantic.dev/agents/#streaming-events-and-final-output after a ToolCallPartDelta, and then see what the final ModelResponse in result.all_messages() looks like? I imagine it would have an incomplete ToolCallPart. I wonder if we should indicate on the ModelResponse somehow that it's incomplete because it's been canceled, and cannot be used as message_history, for example.

@@ -641,6 +641,14 @@ def timestamp(self) -> datetime:
"""Get the timestamp of the response."""
raise NotImplementedError()

async def cancel(self) -> None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should document this feature in the Streaming docs

@willgdjones
Copy link
Author

Thank you for this detailed critique! I will address these points when I get the chance to.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

should cancel the response when user stop consuming
2 participants