Skip to content

Conversation

davidhewitt
Copy link
Contributor

Change Summary

Adds iterable_schema, which is intended to solve my proposal in pydantic/pydantic#9541 (comment)

pydantic should update all existing uses of generator_schema to iterable_schema, which allows for lazy = False as a field-level setting. We should probably also have a config setting called lazy_iterables or similar, (TODO).

If we want to allow support for Iterator and Generator types in pydantic, those can use generator_schema.

Related issue number

pydantic/pydantic#9541

Checklist

  • Unit tests for the changes exist
  • Documentation reflects the changes where applicable
  • Pydantic tests pass with this pydantic-core (except for expected changes)
  • My PR is ready to review, please add a comment including the phrase "please review" to assign reviewers

@codspeed-hq
Copy link

codspeed-hq bot commented Sep 23, 2025

CodSpeed Performance Report

Merging #1792 will not alter performance

Comparing dh/iterable-schema (c45e43c) with main (0cd11fe)

Summary

✅ 163 untouched

@davidhewitt davidhewitt requested a review from Viicos October 1, 2025 11:28
Copy link
Member

@Viicos Viicos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Python, an iterable is an object that has an __iter__() method and presumably can produce iterators multiple times from it. What if users want the iterable to just be validated that we can grab an iterator from it (that is what input.validate_iter() is doing), and leave it as is? I guess the issue is that we can't validate the type of the values this way?

@Viicos
Copy link
Member

Viicos commented Oct 6, 2025

Or maybe my point is that as it stands now, we support ABCs to try to preserve the input type to our best knowledge. If someone uses Sequence[int] for example, (1, 2, 3) needs to be preserved as (1, 2, 3), and [1, 2, 3] needs to be preserved as [1, 2, 3].

With Iterable[int] in non lazy/eager mode, some_int_iterable is collected as a list, which might be surprising. Then what would be the benefit of using Iterable[int] over list[int], which also allows iterables to be validated.

@davidhewitt
Copy link
Contributor Author

I guess the issue is that we can't validate the type of the values this way?

I guess exactly this, yes. Validation may in general lead to coercions (e.g. string '1' to integer 1) so Iterable[int] might need some work done.

Then what would be the benefit of using Iterable[int] over list[int], which also allows iterables to be validated.

A great question. Perhaps you're right, and the answer is that Iterable[int] we should treat exactly like Sequence[int], including the way we attempt to reconstruct the original type.

For Iterator and Generator, we could change the behaviour to be more like Callable where we can't ever validate the actual contents. And we could expose ValidatorIterator and/or ValidatorGenerator which users could use to opt-in to the lazy one-use behaviour.

So... seems like this needs more design?

@Viicos
Copy link
Member

Viicos commented Oct 6, 2025

It feels to me that several users reported issues with Iterable (or upvoted such issues) because of three reasons:

  • users using external types (not meant for Pydantic in the first place), that use Iterable as an annotation. This is what happens in Accessing a TypedDict field has ValidationIterator instead of the original value pydantic#9467 (7 👍): users want to validate some type from the OpenAI SDK, naturally use a list for this type and are surprised to see that a ValidatorIterator is actually used:

    from openai.types.chat import ChatCompletionAssistantMessageParam
    from pydantic import BaseModel
    
    
    class MyModel(BaseModel):
        history: list[ChatCompletionAssistantMessageParam]
    
    
    history = [
        {
            "content": None,
            "role": "assistant",
            "tool_calls": [
                {
                    "id": "id",
                    "function": {
                        "arguments": '{"location":"Tokyo, Japan"}',
                        "name": "GetCurrentWeather",
                    },
                    "type": "function",
                }
            ],
        },
    ]
    my_model = MyModel(history=history)
    print(my_model.history)
    """
    [{'role': 'assistant', 'content': None, 'tool_calls': ValidatorIterator(index=0, schema=...)}]
    """

    While confusing, there isn't much we can do. ChatCompletionAssistantMessageParam uses Iterable because it is a type that isn't related to any Pydantic validation process, and as such they probably want to be as loose as possible for static type checkers.

    Even if we introduced a config setting/annotation to eagerly evaluate the iterable, the type isn't "owned" by end users and so they can't apply such config/annotation on it (and unfortunately a lot of OpenAI types are using Iterable).

  • Users that mistakenly think that they should use the most broad type/protocol to match as many types as possible, as reported in Attributes declared as iterables are replaced in the instances by pydantic-core ValidatorIterator instance pydantic#9541 (12 👍) (also Validation of Iterable[T] might want revisiting in V3 pydantic#9266 (comment)). I think we should at least recommend on both these issues that they should just use concrete types. Yes, this breaks static type checking, but this is a general Pydantic issues with type coercion.

  • Users that purposely use Iterable to provide types that implement __iter__(). It is unclear to me still if they expect to be able to fetch iterators from them multiple times (by repeated iter() calls), in which case we should just try to validate that the type is indeed iterable?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants