-
-
Notifications
You must be signed in to change notification settings - Fork 31
ENH: improve support for datetime columns #486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
ENH: improve support for datetime columns #486
Conversation
jorisvandenbossche
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for diving into this and improving the test coverage!
…ith-naive-datetimes-with-arrow
jorisvandenbossche
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@theroggy thanks for further looking into this!
I do have some doubts about how much effort we should do to cover corner cases and what the desired default behaviour should be, see my comments below.
…ith-naive-datetimes-with-arrow
…ith-naive-datetimes-with-arrow
jorisvandenbossche
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some doc comments (I am happy to push some rephrased suggestions) and noticed one more issue in the datetime conversion logic
| assert is_datetime64_dtype(df.datetime_col) | ||
| assert df.datetime_col.iloc[0] == pd.Timestamp(1670, 1, 1, 9, 0, 0) | ||
| assert df.datetime_col.iloc[0].unit == "ms" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason you removed the assert_series_equal(df.datetime_col, exp_dates) from a previous iteration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not 100% sure, but I suppose I didn't find an easy solution to create the exp_dates series when pandas<3.0 and use_arrow=True, as pd.to_datetime overflows on pandas < 3.0. And as an alternative I replaced it with just checking if the value and unit is correct instead.
|
I will take a look at the failing tests (I assume some issue with older pandas versions with my latest suggestion ..) |
…ith-naive-datetimes-with-arrow
|
All green! I want to do some more clarifications in the tests, but that can wait for later ;) I also made some small changes to the docstrings. The diff makes it look like it all changes because of reflowing the text, but it were only minor edits I think |
Great!
OK! Minor edits indeed. As you changed "timezone" to "time zone" in some places, I suppose you have a preference for that... so I made the change consistent through the code base. |
This PR improves support for datetime columns, mainly in
read_dataframeandwrite_dataframe.In general the PR tries to accomplish that:
read_dataframe:pandas.to_datetimefunction in pandas < 3.use_arrow=Truethere are several situations where GDAL 3.11 is needed to get correct results.More specifically:
use_arrow, naive datetimes (no timezone) were interpreted as being UTC. So a naive time of 05:00 h was interpreted as 05:00 UTC.use_arrow, for datetime columns with a timezone the timezone was dropped, so 05:00+5:00 was read as 05:00.use_arrow, for datetime columns with any timezone but UTC, the timezone was dropped, so 05:00+5:00 was written as 05:00 (a naive datetime), for all filetypes.use_arrow, don't convert/represent them as being in UTC time if they have another timezone offset in the dataset.use_arrow, the fixes typically require GDAL >= 3.11 (OGRLayer::GetArrowStream(): add a DATETIME_AS_STRING=YES/NO option OSGeo/gdal#11213).Resolves #487
Resolves #123
Resolves #553