-
Notifications
You must be signed in to change notification settings - Fork 4.1k
GH-49002: [Python] Fix array.to_pandas string type conversion for arrays with None #49247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
AlenkaF
merged 4 commits into
apache:main
from
AlenkaF:gh-49002-pandas-string-to_pandas-empty
Apr 1, 2026
Merged
Changes from all commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I understand why this test has to have this guard now. Isn't it supposed to work with pandas > 3.0.0?I suppose this is because we are testing object types specifically. Was this test failing on CI? I haven't seen the failure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be connected to the change I made in this PR as strings are not converted to pandas object anymore. But looking at the test it might be a leftover from my previous wrong approach. Thanks for the comment, I need to check this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, got it. This test checks that strings can not be zero copied to Pandas. Which has been true in the past as the C++ machinery constructed an object type from Pyarrow string type. Now, with pandas 3.0.0 we can move through
__from_arrow__where no copies are needed.Running this test locally with pandas 3.0.0 gives following error:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, from what I can see this is an expected change, since string conversion will now actually be zero copy
(although, strictly speaking, it is not actually zero-copy entirely, because the test here is using string, and pandas will convert that to large_string. But I suppose that happens outside the view of pyarrow)
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Essentially, the
zero_copy_onlykeyword is ignored whenever the conversion goes throughdtype.__from_arrow__.. (same for other options), so it is not even about no longer making a copy or not in pandas 3.0, just about using an ExtensionDtypeThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yes, I see. Should this be changed when dealing with Extension types? I know we have a list of things to work on when it comes to this topic and we can open up an umbrella issue with all possible improvements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure how to easily improve this .. (since we defer to pandas for the conversion, and that method we call does not have those keywords)
(long term I would like to see this logic to be moved entirely to pandas)