Skip to content

[PYTHON][TESTS] Skip string-to-decimal assertions in test_type_coercion_string_to_numeric on Pandas 3#55701

Draft
zhengruifeng wants to merge 1 commit intoapache:masterfrom
zhengruifeng:SPARK-fix-arrow-udf-string-to-decimal-pandas3
Draft

[PYTHON][TESTS] Skip string-to-decimal assertions in test_type_coercion_string_to_numeric on Pandas 3#55701
zhengruifeng wants to merge 1 commit intoapache:masterfrom
zhengruifeng:SPARK-fix-arrow-udf-string-to-decimal-pandas3

Conversation

@zhengruifeng
Copy link
Copy Markdown
Contributor

@zhengruifeng zhengruifeng commented May 6, 2026

What changes were proposed in this pull request?

Skip the two assertRaises(PythonException) blocks for string -> decimal casts in ArrowPythonUDFTestsMixin.test_type_coercion_string_to_numeric when the active pandas defaults to a non-object (Arrow-backed) string dtype. Detect via pd.Series(["x"]).dtype == object so the assertions still run on Pandas 2 with default settings.

Why are the changes needed?

In Pandas 3, pd.Series(['1', '2']) is backed by ArrowStringArrayNumpySemantics. pa.Array.from_pandas(series, type=pa.decimal128(...)) then silently casts the strings to decimal instead of raising ArrowTypeError. The legacy SQL_ARROW_BATCHED_UDF path goes through PandasToArrowConversion.convert(...) and depends on that exception to surface a PythonException, so the existing assertions stop holding under Pandas 3. The string '1.1' -> int assertion is unaffected because the cast fallback also fails.

This was originally observed in the Pandas-3 build for master at SHA ca4d88dArrowPythonUDFLegacyTests::test_type_coercion_string_to_numeric failing with AssertionError: PythonException not raised.

Does this PR introduce any user-facing change?

No. Test-only change.

How was this patch tested?

Tested locally in a Pandas 3 environment (pandas==3.0.2, pyarrow==23.0.1, Python 3.13.12, future.infer_string=True by default — matching the failing CI image):

  • ArrowPythonUDFTests/LegacyTests/NonLegacyTests::test_type_coercion_string_to_numeric — 3 passed.
  • ArrowPythonUDFParityTests/ParityLegacyTests/ParityNonLegacyTests::test_type_coercion_string_to_numeric (connect) — 3 passed.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (claude-opus-4-7)

@zhengruifeng zhengruifeng force-pushed the SPARK-fix-arrow-udf-string-to-decimal-pandas3 branch from 849114a to 953b92c Compare May 6, 2026 10:10
@zhengruifeng zhengruifeng changed the title [PYTHON] Restore string-to-decimal type mismatch error in Arrow Python UDF on Pandas 3 [PYTHON][TESTS] Skip string-to-decimal assertions in test_type_coercion_string_to_numeric on Pandas 3 May 6, 2026
@zhengruifeng zhengruifeng force-pushed the SPARK-fix-arrow-udf-string-to-decimal-pandas3 branch from 953b92c to 356b888 Compare May 6, 2026 10:21
…on_string_to_numeric on Pandas 3

### What changes were proposed in this pull request?

Skip the two `assertRaises(PythonException)` blocks for `string -> decimal` casts in `ArrowPythonUDFTestsMixin.test_type_coercion_string_to_numeric` when the active pandas defaults to a non-`object` (Arrow-backed) string dtype. Detect via `pd.Series(["x"]).dtype == object` so the assertions still run on Pandas 2 with default settings.

### Why are the changes needed?

In Pandas 3, `pd.Series(['1', '2'])` is backed by `ArrowStringArrayNumpySemantics`. `pa.Array.from_pandas(series, type=pa.decimal128(...))` then silently casts the strings to decimal instead of raising `ArrowTypeError`. The legacy `SQL_ARROW_BATCHED_UDF` path goes through `PandasToArrowConversion.convert(...)` and depends on that exception to surface a `PythonException`, so the existing assertions stop holding under Pandas 3. The `string '1.1' -> int` assertion is unaffected because the cast fallback also fails.

This was originally observed in [the Pandas-3 build for `master` at SHA ca4d88d](https://github.com/apache/spark/actions/runs/25402959034/job/74508177559) - `ArrowPythonUDFLegacyTests::test_type_coercion_string_to_numeric` failing with `AssertionError: PythonException not raised`.

### Does this PR introduce _any_ user-facing change?

No. Test-only change.

### How was this patch tested?

Tested locally in a Pandas 3 environment (`pandas==3.0.2`, `pyarrow==23.0.1`, Python 3.13.12, `future.infer_string=True` by default - matching the failing CI image):

* `ArrowPythonUDFTests/LegacyTests/NonLegacyTests::test_type_coercion_string_to_numeric` - 3 passed.
* `ArrowPythonUDFParityTests/ParityLegacyTests/ParityNonLegacyTests::test_type_coercion_string_to_numeric` (connect) - 3 passed.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (claude-opus-4-7)
@zhengruifeng zhengruifeng force-pushed the SPARK-fix-arrow-udf-string-to-decimal-pandas3 branch from 356b888 to 6b93dea Compare May 6, 2026 11:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants