Skip to content

BUG: Fix Series.str.contains with compiled regex on Arrow string dtype (#61942) #61946

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

Aniketsy
Copy link

#61942

This PR fixes an issue in Series.str.contains() where passing a compiled regex object failed when the underlying string data is backed by PyArrow.

Please, provide feedback if my approach is not correct , I would love to improve and contribute in this.

@Aniketsy Aniketsy force-pushed the fix-arrow-contains-regex-v2 branch from 0b16375 to 838b1c5 Compare July 25, 2025 13:15
@Aniketsy
Copy link
Author

Hi @mroeschke
I've worked on the issue
BUG: Fix Series.str.contains with compiled regex on Arrow string dtype ([#61942])
and have opened a pull request for it.

I'd appreciate it if you could take a look and share your feedback.
Please let me know if anything needs to be improved or clarified.

Thanks!

Copy link
Member

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fix should go into _str_contains of ArrowExtensionArray

@Aniketsy
Copy link
Author

Thankyou for the feedback!
I will update that.

@jorisvandenbossche
Copy link
Member

Additionally, if this is something that is not implemented by pyarrow, we should not raise a NotImplementedError, but fall back on the python object implementation (you can see a similar pattern in some other str methods, like ArrowStringArray._str_replace)

@jorisvandenbossche jorisvandenbossche added Bug Strings String extension data type and string data Arrow pyarrow functionality labels Jul 26, 2025
@jorisvandenbossche jorisvandenbossche added this to the 2.3.2 milestone Jul 26, 2025
@Aniketsy
Copy link
Author

@jorisvandenbossche Thank you for the feedback! I will update the PR accordingly.

Would you mind letting me know the reason behind the one failing check (pre-commit.ci)?
Thanks again!

@jorisvandenbossche
Copy link
Member

Would you mind letting me know the reason behind the one failing check (pre-commit.ci)?

ruff is failing, which is used for auto formatting. I would recommend to install the pre-commit locally to avoid having this fail on CI: https://pandas.pydata.org/docs/dev/development/contributing_codebase.html#pre-commit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Arrow pyarrow functionality Bug Strings String extension data type and string data
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants