Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement: Add support for timezone-flexible DateTime (#1352) #1902

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

max-raphael
Copy link

@max-raphael max-raphael commented Jan 25, 2025

Enhancement: Add support for timezone-flexible DateTime (#1352)

This pull request introduces a new feature to the DateTime class, allowing for more flexible handling of timezones during datetime coercion and validation. Prevously, if a field was type DateTime and it was going to receive tz-aware data:

  • The timezone needed to be defined at the time of class definition
  • The data could only have one timezone associated with it

This feature enables users to define a DateTime field and not define the timezone up front, by setting timezone_flexible=True. However, if coerce=True, a timezone (tz) must also be defined. This ensures consistent handling and avoids ambiguity when coercing datetime values.

Example of pandera DataFrameModel definition

class MyModel(DataFrameModel):
    flexible_datetime_1: DateTime(timezone_flexible=True)
    flexible_datetime_2: DateTime(timezone_flexible=True, tz=ZoneInfo("UTC")) = Field(coerce=True) # coerces all timestamps to UTC
    flexible_datetime_3: DateTime(timezone_flexible=True, tz=ZoneInfo("America/Chicago)) = Field(coerce=True)  # coerces all timestamps to Chicago time

Functionality Specifics:

If coerce is False:

  • Accepts datetimes that all have the same timezones, and will infer the dtype to be a datetime64 with the given timezone
  • Accepts datetimes that have different timezones, and will infer the dtype to be 'object' (same as pandas dataframe)
  • Raises an exception if any datetimes are timezone-naive
  • Ignores additional input tz, as specifying it up front without coercion defeats the purpose of using this. If you want to validate that the data is of a timezone that is known at the time of class definition, do not set timezone_flexible=True.

Examples:

Example 1
Input Data:

        [
            dt.datetime(2023, 3, 1, 5, tzinfo=ZoneInfo('America/New_York')),
            dt.datetime(2023, 3, 1, 5, tzinfo=ZoneInfo('America/Los_Angeles')),
            dt.datetime(2023, 3, 1, 5)  # timezone naive
        ]

raises an exception.

Example 2
Input Data:

        [
            dt.datetime(2023, 3, 1, 5, tzinfo=ZoneInfo('America/New_York')),
            dt.datetime(2023, 3, 1, 5, tzinfo=ZoneInfo('America/Los_Angeles'))
        ]

results in output being the same as input

Note
Setting DateTime(tz="America/Chicago") has no effect on the timezone because coerce=False. That functionality is already handled by timezone_flexible=False.

If coerce is True:

  • Accepts datetimes that have the same timezones, have different timezones, or are timezone-naive
  • Will coerce all datetimes to have the same timezone
  • All timezone-aware datetimes will be converted to the specified tz, and all timezone-naive datetimes will be localized to that tz

Examples:

Example 1
Input data:

        [
            dt.datetime(2023, 3, 1, 5, tzinfo=ZoneInfo('America/New_York')),
            dt.datetime(2023, 3, 1, 5, tzinfo=ZoneInfo('America/Los_Angeles')),
            dt.datetime(2023, 3, 1, 5)  # timezone naive
        ]

Setting DateTime(tz="UTC") and coerce=True results in

        [
            dt.datetime(2023, 3, 1, 9, tzinfo=ZoneInfo('UTC')),
            dt.datetime(2023, 3, 1, 13, tzinfo=ZoneInfo('UTC')),
            dt.datetime(2023, 3, 1, 5,  tzinfo=ZoneInfo('UTC'))
        ]

Example 2
Input data:

        [
            dt.datetime(2023, 3, 1, 5, tzinfo=ZoneInfo('America/New_York')),
            dt.datetime(2023, 3, 1, 5, tzinfo=ZoneInfo('America/Los_Angeles')),
            dt.datetime(2023, 3, 1, 5)  # timezone naive
        ]

Setting DateTime(tz="America/Chicago") and coerce=True results in

        [
            dt.datetime(2023, 3, 1, 3, tzinfo=ZoneInfo('America/Chicago')),
            dt.datetime(2023, 3, 1, 7, tzinfo=ZoneInfo('America/Chicago')),
            dt.datetime(2023, 3, 1, 5,  tzinfo=ZoneInfo('America/Chicago'))
        ]

Implementation details

Because this feature is intended to be used when the timezone may not known at the time of class definition (or there are multiple timezones), self.type and self.tz must be defined during validation, and possibly coercion (if coerce is True). Therefore, both DateTime.coerce() and DateTime.check() now contain conditional logic to infer and define the type and tz of the field based on the data its received.

@max-raphael max-raphael force-pushed the feature/1352 branch 3 times, most recently from 529178c to cd998c8 Compare January 25, 2025 08:45
@max-raphael max-raphael reopened this Jan 25, 2025
@max-raphael max-raphael force-pushed the feature/1352 branch 2 times, most recently from 09dfd96 to ca6046a Compare January 25, 2025 18:58
Copy link
Collaborator

@cosmicBboy cosmicBboy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution @max-raphael !

can we rename this to time_zone_agnostic to match the polars dtype engine implementation? #1589

I believe it's the same behavior

Copy link

codecov bot commented Jan 25, 2025

Codecov Report

Attention: Patch coverage is 97.14286% with 1 line in your changes missing coverage. Please review.

Project coverage is 93.28%. Comparing base (812b2a8) to head (ca6046a).
Report is 188 commits behind head on main.

Files with missing lines Patch % Lines
pandera/engines/pandas_engine.py 97.14% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1902      +/-   ##
==========================================
- Coverage   94.28%   93.28%   -1.00%     
==========================================
  Files          91      121      +30     
  Lines        7013     9338    +2325     
==========================================
+ Hits         6612     8711    +2099     
- Misses        401      627     +226     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

…1352)

Signed-off-by: Max Raphael <[email protected]>

Enhancement: Add support for timezone-flexible DateTime (unionai-oss#1352)

Signed-off-by: Max Raphael <[email protected]>
@max-raphael
Copy link
Author

Thanks for the contribution @max-raphael !

can we rename this to time_zone_agnostic to match the polars dtype engine implementation? #1589

I believe it's the same behavior

No problem @cosmicBboy! I've just pushed a change that uses time_zone_agnostic as the flag, and makes a small fix to use pytz instead of ZoneInfo in the tests as this still is building for 3.8.

@max-raphael
Copy link
Author

Just checking back in here @cosmicBboy, is there anything else I need to do in order for the rest of the workflow checks to run?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants