Skip to content

GH-49875: [Python] Fix timezone dropped when converting tz-aware Categorical to Arrow array#49878

Merged
AlenkaF merged 4 commits intoapache:mainfrom
AnkitAhlawat7742:fix/timezone_dropped
May 6, 2026
Merged

GH-49875: [Python] Fix timezone dropped when converting tz-aware Categorical to Arrow array#49878
AlenkaF merged 4 commits intoapache:mainfrom
AnkitAhlawat7742:fix/timezone_dropped

Conversation

@AnkitAhlawat7742
Copy link
Copy Markdown
Contributor

@AnkitAhlawat7742 AnkitAhlawat7742 commented Apr 28, 2026

Rationale for this change

When converting a pandas.Categorical with tz-aware datetime categories to a PyArrow array, the timezone information was silently dropped from the dictionary array's value type. This is a silent data loss bug — no warning or error is raised, but the timezone metadata is lost.

What changes are included in this PR?

In python/pyarrow/array.pxi, the Categorical conversion was using values.categories.values(raw numpy array) which strips timezone metadata since numpy does not support tz-aware datetimes. Changed to values.categories (pandas Index) and added from_pandas=True so PyArrow uses the pandas conversion path, which correctly preserves timezone metadata.

Are these changes tested?

Yes. Verified manually

Are there any user-facing changes?

Yes — this is a bug fix. Users did #49875

This PR contains a "Critical Fix" — timezone information was lost silently during conversion without any warning or error.

@github-actions
Copy link
Copy Markdown

⚠️ GitHub issue #49875 has been automatically assigned in GitHub to PR creator.

@AnkitAhlawat7742
Copy link
Copy Markdown
Contributor Author

Hi @AlenkaF
Please review these changes

Copy link
Copy Markdown
Member

@AlenkaF AlenkaF left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this change makes sense but the PR is lacking a test. The reproducible example from the issue can be reused and added to https://github.com/apache/arrow/blob/main/python/pyarrow/tests/test_pandas.py mimicking existing categorical tests.

cc @jorisvandenbossche in case of any opinions.

@AnkitAhlawat7742
Copy link
Copy Markdown
Contributor Author

I think this change makes sense but the PR is lacking a test. The reproducible example from the issue can be reused and added to https://github.com/apache/arrow/blob/main/python/pyarrow/tests/test_pandas.py mimicking existing categorical tests.

cc @jorisvandenbossche in case of any opinions.

Thank you for the review!
I have added a regression test to python/pyarrow/tests/test_pandas.py based on the reproducible example from the issue.
Please have a look.

@AlenkaF
Copy link
Copy Markdown
Member

AlenkaF commented May 4, 2026

CI failures are related, could you have a look?

@AnkitAhlawat7742
Copy link
Copy Markdown
Contributor Author

CI failures are related, could you have a look?

Fixed:

  1. Resolved the formatting issue.
  2. CI failure was caused by pandas version compatibility in the test assertion:
    str(cat.dtype.categories.dtype) == "datetime64[us, US/Eastern]"

Please retrigger the CI

Copy link
Copy Markdown
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

Comment thread python/pyarrow/tests/test_pandas.py Outdated
Comment thread python/pyarrow/tests/test_pandas.py Outdated
@jorisvandenbossche jorisvandenbossche changed the title GH-49875:[Python] Fix timezone dropped when converting tz-aware Categorical to Arrow array GH-49875: [Python] Fix timezone dropped when converting tz-aware Categorical to Arrow array May 4, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 4, 2026

⚠️ GitHub issue #49875 has been automatically assigned in GitHub to PR creator.

Comment thread python/pyarrow/tests/test_pandas.py Outdated
@github-actions github-actions Bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels May 4, 2026
@AlenkaF
Copy link
Copy Markdown
Member

AlenkaF commented May 5, 2026

@github-actions crossbow submit pandas

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 5, 2026

Revision: 0f1f595

Submitted crossbow builds: ursacomputing/crossbow @ actions-3f725ba4fc

Task Status
test-conda-python-3.10-pandas-1.3.4-numpy-1.21.2 GitHub Actions
test-conda-python-3.11-pandas-latest-numpy-latest GitHub Actions
test-conda-python-3.12-pandas-latest-numpy-1.26 GitHub Actions
test-conda-python-3.12-pandas-latest-numpy-latest GitHub Actions
test-conda-python-3.13-pandas-nightly-numpy-nightly GitHub Actions
test-conda-python-3.13-pandas-upstream_devel-numpy-nightly GitHub Actions

Copy link
Copy Markdown
Member

@AlenkaF AlenkaF left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! The failing extended builds are not connected. Will try to see if there is an issue opened already to track that.

@AlenkaF
Copy link
Copy Markdown
Member

AlenkaF commented May 5, 2026

Opened an issue for the failing builds here: #49920

@AlenkaF
Copy link
Copy Markdown
Member

AlenkaF commented May 6, 2026

@jorisvandenbossche mind having one more look before I merge?

Copy link
Copy Markdown
Member

@raulcd raulcd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me

@github-actions github-actions Bot added awaiting merge Awaiting merge and removed awaiting committer review Awaiting committer review labels May 6, 2026
@AlenkaF AlenkaF merged commit ea8cef5 into apache:main May 6, 2026
22 checks passed
@AlenkaF AlenkaF removed the awaiting merge Awaiting merge label May 6, 2026
@conbench-apache-arrow
Copy link
Copy Markdown

After merging your PR, Conbench analyzed the 0 benchmarking runs that have been run so far on merge-commit ea8cef5.

None of the specified runs were found on the Conbench server.

The full Conbench report has more details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants