-
-
Notifications
You must be signed in to change notification settings - Fork 320
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bugfix/1677 Fix Pandera DataFrame - Pydantic compatibility #1904
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1904 +/- ##
==========================================
- Coverage 94.28% 93.38% -0.90%
==========================================
Files 91 121 +30
Lines 7013 9304 +2291
==========================================
+ Hits 6612 8689 +2077
- Misses 401 615 +214 ☔ View full report in Codecov by Sentry. |
Signed-off-by: Jarek-Rolski <[email protected]>
pandera/typing/pandas.py
Outdated
with config_context(validation_enabled=False): | ||
schema_model = _source_type().__orig_class__.__args__[0] | ||
schema = schema_model.to_schema() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hey @Jarek-Rolski this PR looks almost ready to merge!
quick question: why is this config_context
block needed here? I don't think the nested code does any validation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pandera makes validation during schema_model extraction, it seems to have a problem during validation. Two tests were raising an error e.g.:
tests\fastapi\test_app.py:14: in <module>
from tests.fastapi.models import Transactions, TransactionsOut
tests\fastapi\models.py:46: in <module>
class ResponseModel(BaseModel):
venv\Lib\site-packages\pydantic\_internal\_model_construction.py:205: in __new__
complete_model_class(
venv\Lib\site-packages\pydantic\_internal\_model_construction.py:534: in complete_model_class
schema = cls.__get_pydantic_core_schema__(cls, handler)
venv\Lib\site-packages\pydantic\main.py:643: in __get_pydantic_core_schema__
return handler(source)
venv\Lib\site-packages\pydantic\_internal\_schema_generation_shared.py:83: in __call__
schema = self._handler(source_type)
venv\Lib\site-packages\pydantic\_internal\_generate_schema.py:512: in generate_schema
schema = self._generate_schema_inner(obj)
venv\Lib\site-packages\pydantic\_internal\_generate_schema.py:784: in _generate_schema_inner
return self._model_schema(obj)
venv\Lib\site-packages\pydantic\_internal\_generate_schema.py:591: in _model_schema
{k: self._generate_md_field_schema(k, v, decorators) for k, v in fields.items()},
venv\Lib\site-packages\pydantic\_internal\_generate_schema.py:947: in _generate_md_field_schema
common_field = self._common_field_schema(name, field_info, decorators)
venv\Lib\site-packages\pydantic\_internal\_generate_schema.py:1134: in _common_field_schema
schema = self._apply_annotations(
venv\Lib\site-packages\pydantic\_internal\_generate_schema.py:1890: in _apply_annotations
schema = get_inner_schema(source_type)
venv\Lib\site-packages\pydantic\_internal\_schema_generation_shared.py:83: in __call__
schema = self._handler(source_type)
venv\Lib\site-packages\pydantic\_internal\_generate_schema.py:1871: in inner_handler
schema = self._generate_schema_inner(obj)
venv\Lib\site-packages\pydantic\_internal\_generate_schema.py:789: in _generate_schema_inner
return self.match_type(obj)
venv\Lib\site-packages\pydantic\_internal\_generate_schema.py:871: in match_type
return self._match_generic_type(obj, origin)
venv\Lib\site-packages\pydantic\_internal\_generate_schema.py:890: in _match_generic_type
from_property = self._generate_schema_from_property(origin, obj)
venv\Lib\site-packages\pydantic\_internal\_generate_schema.py:679: in _generate_schema_from_property
schema = get_schema(
pandera\typing\pandas.py:189: in __get_pydantic_core_schema__
schema_model = _source_type().__orig_class__.__args__[0]
pandera\typing\common.py:129: in __patched_generic_alias_call
result.__orig_class__ = self
pandera\typing\common.py:181: in __setattr__
self.__dict__ = schema_model.validate(self).__dict__
pandera\api\dataframe\model.py:289: in validate
cls.to_schema().validate(
pandera\api\pandas\container.py:126: in validate
return self._validate(
pandera\api\pandas\container.py:147: in _validate
return self.get_backend(check_obj).validate(
pandera\backends\pandas\container.py:104: in validate
error_handler = self.run_checks_and_handle_errors(
pandera\backends\pandas\container.py:182: in run_checks_and_handle_errors
error_handler.collect_error(
pandera\api\base\error_handler.py:54: in collect_error
raise schema_error from original_exc
E pandera.errors.SchemaError: column 'id' not in dataframe. Columns in dataframe: []
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay cool I can take a look at this in a separate PR
pandera/typing/pandas.py
Outdated
type_map = { | ||
"str": core_schema.str_schema(), | ||
"int64": core_schema.int_schema(), | ||
"float64": core_schema.float_schema(), | ||
"bool": core_schema.bool_schema(), | ||
"datetime64[ns]": core_schema.datetime_schema(), | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this would be limited to just the numpy datatypes right?
will we need to create a follow-up PR to support the pyarrow datatypes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made some changes to enable pyarrow. I used pandera to_json_schema() function to get general types names. I tested it for various numpy/pandas/pyarrow types and it seems to work. I only had a problem with more exotic types like pyarrow.large_string, to_json_schema() labels it as "any".
Are you happy with such change or should I revert it and add pyarrow types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool, this looks good to me for now, we can make further investments in a future PR
Signed-off-by: Jarek-Rolski <[email protected]>
Update to pydantic-core requires additional parameter
json_schema_input_schema
incore_schema.no_info_plain_validator_function
function.I did some checking and it seems that it doesn't matter what is put under the variable as long as it belongs to
core_schema
validschema
types. Generated schema also doesn't contain field checks. However, pydantic model validation includes pandera submodel with all checks.I'm not sure if this is correct. I made changes based on #1677 and #1704
I had to modify one test, because current code change discovers schema issue earlier than before.