- 
                Notifications
    You must be signed in to change notification settings 
- Fork 311
          Add allow_partial
          #1512
        
          New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
  
    Add allow_partial
  
  #1512
              
            Conversation
5af9b88    to
    554ec40      
    Compare
  
    | CodSpeed Performance ReportMerging #1512 will degrade performances by 16.62%Comparing  Summary
 Benchmarks breakdown
 | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Surprisingly minimal changes here given the new functionality added - cool!
I'm a bit confused about the naming around enumerate_last_partial - perhaps we could bikeshed just a bit here to make the purpose of this function more clear?
| from pydantic_core import SchemaValidator, ValidationError, core_schema | ||
|  | ||
|  | ||
| def test_list(): | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we just parametrize here and have a test for validation fails and validation successes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(same questions below)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise, tests lgtm
| Curious to hear what @davidhewitt thinks about the way this is implemented - overall, looks good to me, I think having things centralized around the  It's interesting to me the ways in which  | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall functionally looks like it's heading in the right direction, but I think there can be some simplifications. I also have a bunch of questions which hint at edge cases.
        
          
                src/input/input_python.rs
              
                Outdated
          
        
      | Self::Dict(dict) => dict.keys().iter().last(), | ||
| Self::Mapping(mapping) => mapping.keys().ok()?.iter().ok()?.last()?.ok(), | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NB both of these are potentially inefficient; .keys() creates a new PyList of all the keys.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup, I don't love it, but at least it's only called when allow_partial=true.
Is there a more efficient way? (especially for dicts?)
Long term, if we move to RustModel and iterating over dicts when building typeddicts, we should be able to minimise use of this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.call_method0("keys").iter().last() might be better (Python keys iterator rather than list), but I think it still sucks. Can't get better until we iterate, as you say.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested by adding:
#[pyfunction]
pub fn mapping_last_key_a<'a>(mapping: &'a Bound<'a, PyMapping>) -> Option<Bound<'a, PyAny>> {
    mapping.keys().ok()?.iter().ok()?.last()?.ok()
}
#[pyfunction]
pub fn mapping_last_key_b<'a>(mapping: &'a Bound<'a, PyMapping>) -> Option<Bound<'a, PyAny>> {
    mapping.call_method0(intern!(mapping.py(), "keys")).ok()?.iter().ok()?.last()?.ok()
}and tested with
from typing import Mapping
import timeit
from pydantic_core import _pydantic_core
class MyMapping(Mapping):
    def __init__(self, d):
        self._d = d
    def __getitem__(self, key):
        return self._d[key]
    def __iter__(self):
        return iter(self._d)
    def __len__(self):
        return len(self._d)
mapping = MyMapping({str(i): i for i in range(100)})
v = _pydantic_core.mapping_last_key_a(mapping)
assert v == '99', v
v = _pydantic_core.mapping_last_key_b(mapping)
assert v == '99', v
def run_bench(func):
    timer = timeit.Timer(
        "func(mapping)", setup="", globals={"func": func, "mapping": mapping}
    )
    n, t = timer.autorange()
    iter_time = t / n
    # print(f'{func.__module__}.{func.__name__}', iter_time)
    return int(iter_time * 1_000_000_000)
print(f'mapping_last_key_a: {run_bench(_pydantic_core.mapping_last_key_a)}ns')
print(f'mapping_last_key_b: {run_bench(_pydantic_core.mapping_last_key_b)}ns')Output:
mapping_last_key_a: 2487ns
mapping_last_key_b: 1918ns
Outcome: they're both pretty slow, but your suggestion is slightly faster.
        
          
                src/validators/validation_state.rs
              
                Outdated
          
        
      | pub fn enumerate_last_partial<'i, I>( | ||
| &self, | ||
| iter: impl Iterator<Item = I> + 'i, | ||
| ) -> Box<dyn Iterator<Item = (usize, bool, I)> + 'i> { | ||
| if self.allow_partial { | ||
| Box::new(EnumerateLastPartial::new(iter)) | ||
| } else { | ||
| Box::new(iter.enumerate().map(|(i, x)| (i, false, x))) | ||
| } | ||
| } | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than forcing dynamic dispatch here, it might be better to use .peekable() at the callsites to be able to check if the current iteration was the last as part of the error pathway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great idea!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
peekable made things worse, but I got rid of the Box dyn in 47f1c15
and it helped quite a lot locally - comparing main to 47f1c15:
┏━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┓
┃  Group      ┃  Benchmark                          ┃  Before (µs/iter)  ┃  After (µs/iter)  ┃  Change  ┃
┡━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━┩
│  List       │  list_of_ints_core_py               │             29.17  │            30.09  │  +3.17%  │
├─────────────┼─────────────────────────────────────┼────────────────────┼───────────────────┼──────────┤
│  List JSON  │  list_of_ints_core_json             │             42.49  │            43.57  │  +2.55%  │
├─────────────┼─────────────────────────────────────┼────────────────────┼───────────────────┼──────────┤
│  Set        │  set_of_ints_core                   │             52.26  │            52.81  │  +1.06%  │
│             │  set_of_ints_core_duplicates        │             35.59  │            35.51  │  -0.22%  │
│             │  set_of_ints_core_length            │             53.59  │            55.16  │  +2.94%  │
├─────────────┼─────────────────────────────────────┼────────────────────┼───────────────────┼──────────┤
│  Set JSON   │  set_of_ints_core_json              │             56.31  │            58.40  │  +3.70%  │
│             │  set_of_ints_core_json_duplicates   │             44.14  │            45.07  │  +2.12%  │
├─────────────┼─────────────────────────────────────┼────────────────────┼───────────────────┼──────────┤
│  FrozenSet  │  frozenset_of_ints_core             │             16.60  │            17.26  │  +3.99%  │
│             │  frozenset_of_ints_duplicates_core  │             13.78  │            14.78  │  +7.20%  │
├─────────────┼─────────────────────────────────────┼────────────────────┼───────────────────┼──────────┤
│  Dict       │  dict_of_ints_core                  │             85.74  │            86.36  │  +0.73%  │
├─────────────┼─────────────────────────────────────┼────────────────────┼───────────────────┼──────────┤
│  Dict JSON  │  dict_of_ints_core_json             │             127.4  │            126.5  │  -0.72%  │
└─────────────┴─────────────────────────────────────┴────────────────────┴───────────────────┴──────────┘
|  | ||
|  | ||
| def test_tuple_list(): | ||
| """Tuples don't support partial, so behaviour should be disabled.""" | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not? At least for variadic tuples this seems potentially acceptable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I definitely want to support it in future, I was just implementing it for the minimum set of validators to prove the idea initially.
        
          
                src/input/return_enums.rs
              
                Outdated
          
        
      | for (index, item_result) in iter.enumerate() { | ||
|  | ||
| for (index, is_last_partial, item_result) in state.enumerate_last_partial(iter) { | ||
| state.allow_partial = is_last_partial && validator.supports_partial(); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the .supports_partial() guard necessary (here and in general)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because if you enabled allow_partial for a validator which doesn't have proper support for the feature, you would end up skipping errors in the wrong places.
E.g. if you had list[tuple[list[int], list[str]]:
- the outer list validator supports allow_partial, so it'll ignore errors in the last entry in the input list
- but if it passed allow_partial=truedown to the tuple validator (which doesn't current support) allow_partial
- it would pass statethrough to both inner list validators unchanged
- then the list validator associated with the 0th entry in the tuple would get allow_partial=true
- and therefore errors in the last entry of the 0th member of the tuple in the last entry of the outer list would be ignored when they shouldn't be
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels to me like the "doesn't support partial" is the exception rather than the rule, and probably only for non-variadic tuples? Otherwise, I would have thought that:
- collections would always want to respect allow_partialwhen passed in (forwarding it to their last element)
- everything else e.g. with_defaultetc should not even care aboutallow_partialand just forward it naively
| Err(ValError::LineErrors(line_errors)) => { | ||
| for err in line_errors { | ||
| errors.push(err.with_outer_location(key.clone())); | ||
| if !is_last_partial { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I'm feeling very baited to refactor these error combiners now 😂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened #1517 as a spike to start down that road.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I definitely want to support it in future, I was just implementing it for the minimum set of validators to prove the idea initially.
I think we need it to be approximately right for most validators to actually be worth inviting users to try, and I'm not sure it is at the moment.
        
          
                src/input/input_python.rs
              
                Outdated
          
        
      | Self::Dict(dict) => dict.keys().iter().last(), | ||
| Self::Mapping(mapping) => mapping.keys().ok()?.iter().ok()?.last()?.ok(), | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.call_method0("keys").iter().last() might be better (Python keys iterator rather than list), but I think it still sucks. Can't get better until we iterate, as you say.
        
          
                src/input/return_enums.rs
              
                Outdated
          
        
      | for (index, item_result) in iter.enumerate() { | ||
|  | ||
| for (index, is_last_partial, item_result) in state.enumerate_last_partial(iter) { | ||
| state.allow_partial = is_last_partial && validator.supports_partial(); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels to me like the "doesn't support partial" is the exception rather than the rule, and probably only for non-variadic tuples? Otherwise, I would have thought that:
- collections would always want to respect allow_partialwhen passed in (forwarding it to their last element)
- everything else e.g. with_defaultetc should not even care aboutallow_partialand just forward it naively
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels like there's very common use cases missing from here:
- unions
- nullable
- defaults
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think validators fit into three cases:
- already done
- doesn't apply - things like IntValidatorwhereallow_partialdoesn't mean anything
- TODO simple cases - things like nullable where AFAIK it's just a matter of passing the allow_partialinstruction to its nested validator - that's my understanding of these but maybe it's more complicated
- TODO - collections or multiple nested validators where we need custom logic
I think validators break down like this:
| Validator | Status | 
|---|---|
| TypedDictValidator | DONE | 
| UnionValidator | TODO simple case? | 
| TaggedUnionValidator | TODO simple case | 
| NullableValidator | TODO simple case | 
| ModelValidator | doesn't apply | 
| ModelFieldsValidator | TODO | 
| DataclassArgsValidator | TODO | 
| DataclassValidator | doesn't apply | 
| StrValidator | doesn't apply | 
| StrConstrainedValidator | doesn't apply | 
| IntValidator | doesn't apply | 
| ConstrainedIntValidator | doesn't apply | 
| BoolValidator | doesn't apply | 
| FloatValidator | doesn't apply | 
| ConstrainedFloatValidator | doesn't apply | 
| DecimalValidator | doesn't apply | 
| ListValidator | DONE | 
| SetValidator | DONE | 
| TupleValidator | TODO | 
| DictValidator | DONE | 
| NoneValidator | doesn't apply | 
| FunctionBeforeValidator | TODO simple case? | 
| FunctionAfterValidator | TODO simple case? | 
| FunctionPlainValidator | TODO simple case? | 
| FunctionWrapValidator | TODO simple case? | 
| CallValidator | TODO simple case? | 
| LiteralValidator | doesn't apply | 
| IntEnumValidator | doesn't apply | 
| StrEnumValidator | doesn't apply | 
| FloatEnumValidator | doesn't apply | 
| PlainEnumValidator | doesn't apply | 
| AnyValidator | doesn't apply | 
| BytesValidator | doesn't apply | 
| BytesConstrainedValidator | doesn't apply | 
| DateValidator | doesn't apply | 
| TimeValidator | doesn't apply | 
| DateTimeValidator | doesn't apply | 
| FrozenSetValidator | DONE | 
| TimeDeltaValidator | doesn't apply | 
| IsInstanceValidator | doesn't apply | 
| IsSubclassValidator | doesn't apply | 
| CallableValidator | doesn't apply | 
| ArgumentsValidator | TODO | 
| WithDefaultValidator | TODO simple case | 
| ChainValidator | TODO simple case? | 
| LaxOrStrictValidator | TODO simple case? | 
| GeneratorValidator | TODO | 
| CustomErrorValidator | TODO simple case? | 
| JsonValidator | TODO | 
| UrlValidator | doesn't apply | 
| MultiHostUrlValidator | doesn't apply | 
| UuidValidator | doesn't apply | 
| DefinitionRefValidator | TODO | 
| JsonOrPython | TODO simple case? | 
| ComplexValidator | doesn't apply | 
If that table is correct, we should set the default for supports_partial to true, then set it to false specifically for fields which don't support it. But I'm still not that confident I'm write about all those cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got rid of supports_partial completely, and instead set state.allow_partial = false on the few validators which don't support it yet.
| please review | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me now, I think this is the right default and makes it explicit where we have decided not to support yet 👍
Co-authored-by: David Hewitt <[email protected]>
Co-authored-by: David Hewitt <[email protected]> Original-commit-hash: a1fa596
Co-authored-by: David Hewitt <[email protected]> Original-commit-link: pydantic/pydantic-core@a1fa596
See pydantic/pydantic#10748 for details.
Selected Reviewer: @sydney-runkle