-
Notifications
You must be signed in to change notification settings - Fork 44
feat: pydantic v1 serialization and validation #399
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@ap-- would it be possible to approve the workflow here? Was going to mark this as ready once all the linting / tests pass correctly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @rx-dwoodward, could you comment why this couldn't just implement __get_validators__
?
What's the limitation in v1 here?
def to_dict(self) -> SerializedUPath: | ||
return { | ||
"path": self.path, | ||
"protocol": self.protocol, | ||
"storage_options": dict(self.storage_options), | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is to_dict()
needed? It does not seem to be a standard v1 BaseModel method.
Let's not extend the public UPath class API for pydantic v1 support
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah thats fair enough. I wanted to include a convenience method for enabling JSON serialization in pydantic v1 models. At present, there would not be a way to serialize this class that a V1 model would utilize - for example doing:
from pydantic.v1 import BaseModel
class Model(BaseModel):
path: UPath
model = Model(path="s3://bucket/path/to/key/")
model.json() # hits TypeError: Object of type 'S3Path' is not JSON serializable
whereas if I do:
class Model(BaseModel):
path: UPath
class Config:
json_encoders = {UPath: UPath.to_dict}
model = Model(path="s3://bucket/path/to/key/")
model.json() # == '{"path": {"path": "bucket/path/to/key", "protocol": "s3", "storage_options": {}}}'
So its mainly a way to expose an easier serialization method that can be used with v1 models - unfortunately the serialization format for v2 is not used with v1 models (and you can't define a hook method on the custom class to provide a standardized serialization method for the class 😭, its solely the responsibility of the v1 model that uses the custom class), so I thought it might be a useful idea to include a serialization method to the class to provide an easier method to use as a json encoder hook. Totally okay with removing it though if you think this is too much over-reach of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm...
We could introduce a general serializer helper function in universal-pathlib, that could then be used.
In principle this would be:
serialize_upath = upath.UPath.__get_pydantic_core_schema__(None, **None)['serialization']['function']
Thinking about this, that might be the best idea. Something similar will land with chaining support: https://github.com/fsspec/universal_pathlib/pull/346/files#diff-79e86edc0bf259fa57cede9d23833d37576186abbdfe51591786da661c2620dfR28-R31
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but wouldn't this enforce that the serialization method requires installation of pydantic >= 2 to execute? Since the .__get_pydantic_core_schema__
imports from a pydantic v2 import path to generate the schema first.
I agree I would like to_dict
to be executed akin to TypeAdapter(UPath).dump_python(upath, mode="json")
but didn't want to enforce installation of pydantic v2 just for dictifying the a UPath
(especially since in the tests there are scenarios where JSON serialization would fail anyway if some storage_options
are not serializable anyway, so figured it probably didn't make sense / matter to exactly use the TypeAdapter
pattern (it would just be softly asking for an additional dependency to be added [softly because the pydantic v2 import would be within the serialization function, or in __get_pydantic_core_schema__
etc]).
But perhaps thats not a big deal - if you want to serialize a UPath - you need pydantic>=2
🤷
@classmethod | ||
def __get_validators__(cls) -> Iterator[Callable]: | ||
yield cls._validate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could support for v1 be added by only introducing __get_validators__
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mentioned in a comment below but linked again for good measure - ultimately __get_validators__
requires yielding callable validation methods / functions that are used during validation - so we need to yield a callable still.
@classmethod | ||
def _validate(cls, v: Any) -> UPath: | ||
if not isinstance(v, UPath): | ||
v = cls._to_serialized_format(v) | ||
|
||
return cls( | ||
v["path"], | ||
protocol=v.get("protocol"), | ||
**v.get("storage_options", {}), # type: ignore[arg-type] | ||
) | ||
return v |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this classmethod needed specifically?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what is actually executed during v1 validation - I added this as a function such that now both v1 and v2 actually run this method when executing validation. I believe the v2 validation used lambda
's beforehand depending on the input type within __get_pydantic_core_schema__
, whereas this is a single source of truth for both v1 and v2 validation with this PR which imo is cleaner than implementing the validation procedure twice for the different versions.
LMK what you think though - I can change it back to having two separate procedures for v1 and v2 if need be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me rephrase: Why does this have to be attached to the class interface. I.e. this could be a separate helper function for v1
support living outside of the UPath class.
I can change it back to having two separate procedures for v1 and v2 if need be.
I'd prefer that. It'll make maintenance easier down the line, when v1 support is dropped eventually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I guess my point is that encapsulating this logic within one method that is used by both v1 (__get_validators__
) and v2 (__get_pydantic_core_schema__
) means that this logic is not necessarily just 'v1' logic. The only 'v1' logic that would be attached to the class is the __get_validators__
method ultimately. I believe having this validation entirely separate will make for more of a maintenance burden / require more code to be removed if v1 support is dropped.
But okay, so you would prefer having a separate function that is not a classmethod that undertakes this validation and have it separate to the v2 validation?
pyproject.toml
Outdated
"pylint >=2.17.4", | ||
"mypy >=1.10.0", | ||
"pydantic >=2", | ||
"pydantic >=2,<3", # <3 required for testing pydantic v1 support, not for actual use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not predict the future. It's better if these tests break once pydantiv 3 actually breaks the api we would rely on in here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this is fair - I can remove the pin but might keep the comment such that when tests inevitably fail when pydantic v3 is released there is something to note why this might be the case? Does that sound suitable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI here is the version policy indicating the intention to remove pydantic.v1
from v3 onwards.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI here is the version policy indicating the intention to remove pydantic.v1 from v3 onwards.
I am not sure I follow. Where does is say in there that pydantic.v1
won't be available in v3 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah sorry good point - I might have misread there - but here is the comment from Sam Colvin about intended removal of the V1 shim.
so |
Well their example yields a classmethod, but i don't see a requirement for this to be a classmethod. The requirement is that you yield a callable, that adheres to the Validator interface. There seems to be note that these are classmethods https://docs.pydantic.dev/1.10/usage/validators/, but it's unclear to me if this is actually a requirement. You could still just bind a helper function to the class if needed before yielding it. |
Yeah you are right it doesn't have to be a classmethod, I just figured its nicer to bind this validation as a method to the object rather than a standalone function outside of this. Typically its a classmethod such that So yes it does not have to be a classmethod if you'd prefer, I just figured it was cleaner to have this attached to the class (as is normally the case) compared to a separate function outside of the class. |
So it sounds to me as if: def somehow_serialize_upath_to_dict(pth: UPath) -> UPathTypedDict: ...
class Model(BaseModel):
path: UPath
class Config:
json_encoders = {UPath: somehow_serialize_upath_to_dict} Would be a recipe we can't get around for pydantic v1 support. That is because unlike with v2, there is no way to extend a class that does not inherit from BaseModel so that it provides its (de-)serialization methods to pydantic. Is there a v1 way to make UPath the root model? so that there would be a UPathV1Model class, that has the necessary pydantic methods? |
Yeah thats exactly right - its a particular annoyance of pydantic v1 to be honest. I mean you could do something like:
Now this handles the jsonifying - because it jsonifies the model not the UPath, but it means you have to access the actual We could change the So imo each solution is inelegant, but directly using the LMK what you think though (and apologies for having to drop these annoying nuances of pydantic v1 on you 😓) |
Yeah this is exactly the case. Hence I thought it would be nicer to have the serialization as a method on the UPath than import it from some other location even if the function name is clear. But lmk if you would prefer a different solution! |
xref: pydantic v2/v1 download statitistics: pydantic/pydantic#11613 (comment) |
I spent a bit of time thinking about this, and I'd be happy to integrate basic functionality for serialization into universal-pathlib, and provide a recipe in the README how to use it to set up a minimal pydantic v1 model. If you're still interested, could you change this PR, and add a The PR could then add a small section to the readme for pydantic support, and provide a recipe for how to use it with v1. |
Adds pydantic v1 support through validators like
__get_validators__
and a custom.to_dict()
method expected to be used alongsideConfig.json_encoders
in pydantic v1 styleBaseModel
's.I altered the testing structure for pydantic somewhat to ensure that both v1 / v2 'style' validation is tested using a parametrisation framework, and capped the pydantic version (for tests only) to
<3
such that the pydantic v1 style validation should be tested against as well.The main downside to capping the pydantic version for tests is if pydantic makes api changes in subsequent versions that do not accept the v1 shim then this testing framework would have to be split out to ensure coverage (in an environment with
>3
and another with<3
tested which would be a pain, but this should have no effect on actual usage of the package since no pydantic code is explicitly used in theUPath
still.Issue link where this is discussed: #397
Let me know if there is anything else you'd like me to add to this - thanks!