Skip to content

GH-45644: [Doc][Python] Document timezone loss when converting timestamp arrays to NumPy#49843

Merged
AlenkaF merged 5 commits intoapache:mainfrom
alex-anast:alex-anast/gh-45644-doc-python-numpy-tz
May 5, 2026
Merged

GH-45644: [Doc][Python] Document timezone loss when converting timestamp arrays to NumPy#49843
AlenkaF merged 5 commits intoapache:mainfrom
alex-anast:alex-anast/gh-45644-doc-python-numpy-tz

Conversation

@alex-anast
Copy link
Copy Markdown
Contributor

@alex-anast alex-anast commented Apr 22, 2026

Rationale for this change

NumPy's datetime64 type does not support timezones. When converting a timezone-aware Arrow timestamp array to NumPy via to_numpy(), the timezone information is silently dropped. This behaviour is expected but undocumented, which can surprise users (see #45644).

What changes are included in this PR?

Adds a "Timezone-aware Timestamps" subsection to docs/source/python/numpy.rst that:

  • Explains the timezone loss when calling to_numpy() on tz-aware timestamp arrays
  • Shows a code example demonstrating the behavior
  • Documents two alternatives: to_pandas() for tz-aware Series, and to_pylist() for Python datetime objects with tzinfo

Are these changes tested?

Documentation-only change. All code examples were verified against pyarrow 24.0.0 and sphinx-lint passes clean.

Are there any user-facing changes?

No behaviour changes. This adds documentation for existing behaviour.

AI-generated code disclosure

This PR was developed with assistance from an AI coding tool (Claude, Anthropic). All changes have been reviewed, understood, and verified.

@github-actions github-actions Bot added the awaiting review Awaiting review label Apr 22, 2026
@github-actions
Copy link
Copy Markdown

⚠️ GitHub issue #45644 has been automatically assigned in GitHub to PR creator.

@github-actions
Copy link
Copy Markdown

⚠️ GitHub issue #45644 has been automatically assigned in GitHub to PR creator.

Copy link
Copy Markdown
Member

@AlenkaF AlenkaF left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR, the change looks good to me!
One ask I have is to include the caveat when using to_pandas() in the case of nested types described in #41162 (works for structs and maps, not for lists; unions and list views would need to be checked).

@alex-anast alex-anast force-pushed the alex-anast/gh-45644-doc-python-numpy-tz branch from 956b6a6 to 34e5fad Compare April 23, 2026 15:24
@alex-anast
Copy link
Copy Markdown
Contributor Author

Thanks for the review, @AlenkaF ! I've added a .. note:: block under the to_pandas() alternative documenting the nested types caveat.

Unrelated, but the most recent commit also fixed the Sphinx doctest failures -- the >>> examples were being picked up by pytest --doctest-glob and failing due to numpy output line-wrapping differences, so I added # doctest: +SKIP to those examples. Please let me know if there's a better way.

@AlenkaF
Copy link
Copy Markdown
Member

AlenkaF commented Apr 23, 2026

Unrelated, but the most recent commit also fixed the Sphinx doctest failures -- the >>> examples were being picked up by pytest --doctest-glob and failing due to numpy output line-wrapping differences, so I added # doctest: +SKIP to those examples. Please let me know if there's a better way.

I would use +SKIP as little as possible and I would make changes only on the lines that are failing. Also, I like to use ELLIPSIS (...) where possible so you can still check the whole first line before the line break. If that makes sense?

@alex-anast
Copy link
Copy Markdown
Contributor Author

@AlenkaF in the most recent iteration, I kept only what's necessary :) The final +SKIP allows for readability here, otherwise I would have to replace tzinfo=zoneinfo.ZoneInfo(key='UTC') with tzinfo=..

Copy link
Copy Markdown
Member

@AlenkaF AlenkaF left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, thank you for the updates!

One last thing I would like to bring up before we merge is the example of converting to NumPy through Pandas (to_pandas(timestamp_as_object=True).to_numpy()). I am undecided if this would be needed here or not, but am leaning towards yes. What do you think @alex-anast?

@AlenkaF
Copy link
Copy Markdown
Member

AlenkaF commented Apr 24, 2026

@github-actions crossbow submit preview-docs

@github-actions
Copy link
Copy Markdown

Revision: 7dcd9df

Submitted crossbow builds: ursacomputing/crossbow @ actions-40c521d3da

Task Status
preview-docs GitHub Actions

@alex-anast
Copy link
Copy Markdown
Contributor Author

Thanks for the suggestion @AlenkaF !
The PR is specifically about to_numpy() losing timezone information, so it makes sense to me to include an example for users to still end up with a NumPy array while preserving timezone info. It neither bloats the doc nor is it confusing to add, so I actually now include it in the implementation -- that's my thinking.

@AlenkaF
Copy link
Copy Markdown
Member

AlenkaF commented May 4, 2026

@github-actions crossbow submit preview-docs

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 4, 2026

Revision: cf094e6

Submitted crossbow builds: ursacomputing/crossbow @ actions-29ed4f5cc3

Task Status
preview-docs GitHub Actions

Copy link
Copy Markdown
Member

@AlenkaF AlenkaF left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread docs/source/python/numpy.rst Outdated
@github-actions github-actions Bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels May 5, 2026
@AlenkaF AlenkaF merged commit 1fc0e19 into apache:main May 5, 2026
19 checks passed
@AlenkaF AlenkaF removed the awaiting committer review Awaiting committer review label May 5, 2026
@conbench-apache-arrow
Copy link
Copy Markdown

After merging your PR, Conbench analyzed the 0 benchmarking runs that have been run so far on merge-commit 1fc0e19.

None of the specified runs were found on the Conbench server.

The full Conbench report has more details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Doc][Python] Timestamp with tz loses its time zone after to_numpy

2 participants