Commit 3c91c58
committed
[SPARK-56742][PYTHON][TESTS] Skip string-to-decimal failure assertion on pandas 3 in test_type_coercion_string_to_numeric
### What changes were proposed in this pull request?
Gate one `assertRaises(PythonException)` block in `ArrowPythonUDFTestsMixin.test_type_coercion_string_to_numeric` on `LooseVersion(pd.__version__) < "3.0.0"`. Specifically, the `string("1","2") -> decimal` failure assertion is skipped on pandas 3+. The other failure assertions (`"1.1" -> int`, `"1.1" -> decimal`) and all success cases are unchanged.
### Why are the changes needed?
`ArrowPythonUDFLegacyTests.test_type_coercion_string_to_numeric` is failing on the scheduled `Build / Python-only (master, Python 3.12, Pandas 3)` job, e.g. https://github.com/apache/spark/actions/runs/25402959034/job/74508177526.
Root cause: pandas 3's `StringDtype` implements `__arrow_array__`. In `PandasToArrowConversion.convert` (`python/pyspark/sql/conversion.py`), the path is
```python
mask = None if hasattr(series.array, "__arrow_array__") else series.isnull()
...
pa.Array.from_pandas(series, mask=mask, type=arrow_type, safe=safecheck)
```
On pandas 2 the result series of strings has object dtype, no `__arrow_array__`, and `from_pandas` with `type=decimal128(...)` raises `ArrowTypeError` ("int or Decimal object expected, got str") which surfaces as `PythonException`. On pandas 3 the series has `StringDtype`, mask is `None`, and the `__arrow_array__` protocol cleanly casts `"1"` to `Decimal("1")` — the conversion silently succeeds, so `assertRaises(PythonException)` fails.
The non-legacy `ArrowPythonUDF` path is unaffected because it converts a Python list directly via `pa.array(list, type=...)`, where pyarrow's per-element type check still rejects `str` for `Decimal`.
### Does this PR introduce _any_ user-facing change?
No. Test-only.
### How was this patch tested?
Verified locally in a Python 3.13 + pandas 3.0.2 + pyarrow 23.0.1 conda env. All three suites pass:
```
$ python/run-tests --testnames \
"pyspark.sql.tests.arrow.test_arrow_python_udf ArrowPythonUDFLegacyTests.test_type_coercion_string_to_numeric, \
pyspark.sql.tests.arrow.test_arrow_python_udf ArrowPythonUDFTests.test_type_coercion_string_to_numeric, \
pyspark.sql.tests.arrow.test_arrow_python_udf ArrowPythonUDFNonLegacyTests.test_type_coercion_string_to_numeric"
...
Tests passed in 11 seconds
```
### Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Code (Opus 4.7)
Closes #55698 from zhengruifeng/fix-arrow-legacy-type-coercion-test.
Authored-by: Ruifeng Zheng <ruifengz@apache.org>
Signed-off-by: Ruifeng Zheng <ruifengz@foxmail.com>
(cherry picked from commit c23e166)
Signed-off-by: Ruifeng Zheng <ruifengz@foxmail.com>1 parent c90b96f commit 3c91c58
1 file changed
Lines changed: 10 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
19 | 19 | | |
20 | 20 | | |
21 | 21 | | |
| 22 | + | |
22 | 23 | | |
23 | 24 | | |
24 | 25 | | |
| |||
43 | 44 | | |
44 | 45 | | |
45 | 46 | | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
46 | 50 | | |
47 | 51 | | |
48 | 52 | | |
| |||
190 | 194 | | |
191 | 195 | | |
192 | 196 | | |
193 | | - | |
194 | | - | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
195 | 203 | | |
196 | 204 | | |
197 | 205 | | |
| |||
0 commit comments