Bugfix/1677 Fix Pandera DataFrame - Pydantic compatibility #1904

Jarek-Rolski · 2025-02-02T23:58:20Z

Update to pydantic-core requires additional parameter json_schema_input_schema in core_schema.no_info_plain_validator_function function.

I did some checking and it seems that it doesn't matter what is put under the variable as long as it belongs to core_schema valid schema types. Generated schema also doesn't contain field checks. However, pydantic model validation includes pandera submodel with all checks.

I'm not sure if this is correct. I made changes based on #1677 and #1704

I had to modify one test, because current code change discovers schema issue earlier than before.

codecov · 2025-02-03T00:01:04Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 93.38%. Comparing base (812b2a8) to head (0747fc9).
Report is 188 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1904      +/-   ##
==========================================
- Coverage   94.28%   93.38%   -0.90%     
==========================================
  Files          91      121      +30     
  Lines        7013     9304    +2291     
==========================================
+ Hits         6612     8689    +2077     
- Misses        401      615     +214

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: Jarek-Rolski <[email protected]>

cosmicBboy · 2025-02-04T04:06:10Z

pandera/typing/pandas.py

+            with config_context(validation_enabled=False):
+                schema_model = _source_type().__orig_class__.__args__[0]
+                schema = schema_model.to_schema()


hey @Jarek-Rolski this PR looks almost ready to merge!

quick question: why is this config_context block needed here? I don't think the nested code does any validation

pandera makes validation during schema_model extraction, it seems to have a problem during validation. Two tests were raising an error e.g.:

tests\fastapi\test_app.py:14: in <module> from tests.fastapi.models import Transactions, TransactionsOut tests\fastapi\models.py:46: in <module> class ResponseModel(BaseModel): venv\Lib\site-packages\pydantic\_internal\_model_construction.py:205: in __new__ complete_model_class( venv\Lib\site-packages\pydantic\_internal\_model_construction.py:534: in complete_model_class schema = cls.__get_pydantic_core_schema__(cls, handler) venv\Lib\site-packages\pydantic\main.py:643: in __get_pydantic_core_schema__ return handler(source) venv\Lib\site-packages\pydantic\_internal\_schema_generation_shared.py:83: in __call__ schema = self._handler(source_type) venv\Lib\site-packages\pydantic\_internal\_generate_schema.py:512: in generate_schema schema = self._generate_schema_inner(obj) venv\Lib\site-packages\pydantic\_internal\_generate_schema.py:784: in _generate_schema_inner return self._model_schema(obj) venv\Lib\site-packages\pydantic\_internal\_generate_schema.py:591: in _model_schema {k: self._generate_md_field_schema(k, v, decorators) for k, v in fields.items()}, venv\Lib\site-packages\pydantic\_internal\_generate_schema.py:947: in _generate_md_field_schema common_field = self._common_field_schema(name, field_info, decorators) venv\Lib\site-packages\pydantic\_internal\_generate_schema.py:1134: in _common_field_schema schema = self._apply_annotations( venv\Lib\site-packages\pydantic\_internal\_generate_schema.py:1890: in _apply_annotations schema = get_inner_schema(source_type) venv\Lib\site-packages\pydantic\_internal\_schema_generation_shared.py:83: in __call__ schema = self._handler(source_type) venv\Lib\site-packages\pydantic\_internal\_generate_schema.py:1871: in inner_handler schema = self._generate_schema_inner(obj) venv\Lib\site-packages\pydantic\_internal\_generate_schema.py:789: in _generate_schema_inner return self.match_type(obj) venv\Lib\site-packages\pydantic\_internal\_generate_schema.py:871: in match_type return self._match_generic_type(obj, origin) venv\Lib\site-packages\pydantic\_internal\_generate_schema.py:890: in _match_generic_type from_property = self._generate_schema_from_property(origin, obj) venv\Lib\site-packages\pydantic\_internal\_generate_schema.py:679: in _generate_schema_from_property schema = get_schema( pandera\typing\pandas.py:189: in __get_pydantic_core_schema__ schema_model = _source_type().__orig_class__.__args__[0] pandera\typing\common.py:129: in __patched_generic_alias_call result.__orig_class__ = self pandera\typing\common.py:181: in __setattr__ self.__dict__ = schema_model.validate(self).__dict__ pandera\api\dataframe\model.py:289: in validate cls.to_schema().validate( pandera\api\pandas\container.py:126: in validate return self._validate( pandera\api\pandas\container.py:147: in _validate return self.get_backend(check_obj).validate( pandera\backends\pandas\container.py:104: in validate error_handler = self.run_checks_and_handle_errors( pandera\backends\pandas\container.py:182: in run_checks_and_handle_errors error_handler.collect_error( pandera\api\base\error_handler.py:54: in collect_error raise schema_error from original_exc E pandera.errors.SchemaError: column 'id' not in dataframe. Columns in dataframe: []

okay cool I can take a look at this in a separate PR

cosmicBboy · 2025-02-04T04:37:41Z

pandera/typing/pandas.py

+            type_map = {
+                "str": core_schema.str_schema(),
+                "int64": core_schema.int_schema(),
+                "float64": core_schema.float_schema(),
+                "bool": core_schema.bool_schema(),
+                "datetime64[ns]": core_schema.datetime_schema(),
+            }


this would be limited to just the numpy datatypes right?

will we need to create a follow-up PR to support the pyarrow datatypes?

I made some changes to enable pyarrow. I used pandera to_json_schema() function to get general types names. I tested it for various numpy/pandas/pyarrow types and it seems to work. I only had a problem with more exotic types like pyarrow.large_string, to_json_schema() labels it as "any".
Are you happy with such change or should I revert it and add pyarrow types?

cool, this looks good to me for now, we can make further investments in a future PR

Signed-off-by: Jarek-Rolski <[email protected]>

fix DataFrame Pydantic compatibility

29e9609

Jarek-Rolski added 3 commits February 3, 2025 00:11

format python file

f862355

update test for new code

71f38a5

prevents Linters from raising an error

25919e1

Signed-off-by: Jarek-Rolski <[email protected]>

cosmicBboy reviewed Feb 4, 2025

View reviewed changes

enable pyarrow and other types in pydantic models

0747fc9

Signed-off-by: Jarek-Rolski <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bugfix/1677 Fix Pandera DataFrame - Pydantic compatibility #1904

Bugfix/1677 Fix Pandera DataFrame - Pydantic compatibility #1904

Jarek-Rolski commented Feb 2, 2025 •

edited

Loading

codecov bot commented Feb 3, 2025 •

edited

Loading

cosmicBboy Feb 4, 2025

Jarek-Rolski Feb 4, 2025

cosmicBboy Feb 4, 2025

cosmicBboy Feb 4, 2025

Jarek-Rolski Feb 4, 2025

cosmicBboy Feb 4, 2025

Bugfix/1677 Fix Pandera DataFrame - Pydantic compatibility #1904

Are you sure you want to change the base?

Bugfix/1677 Fix Pandera DataFrame - Pydantic compatibility #1904

Conversation

Jarek-Rolski commented Feb 2, 2025 • edited Loading

codecov bot commented Feb 3, 2025 • edited Loading

Codecov Report

cosmicBboy Feb 4, 2025

Choose a reason for hiding this comment

Jarek-Rolski Feb 4, 2025

Choose a reason for hiding this comment

cosmicBboy Feb 4, 2025

Choose a reason for hiding this comment

cosmicBboy Feb 4, 2025

Choose a reason for hiding this comment

Jarek-Rolski Feb 4, 2025

Choose a reason for hiding this comment

cosmicBboy Feb 4, 2025

Choose a reason for hiding this comment

Jarek-Rolski commented Feb 2, 2025 •

edited

Loading

codecov bot commented Feb 3, 2025 •

edited

Loading