Skip to content

[GLUTEN-12280][VL] Fix Spark 4 Arrow Python UDF stream writer#12345

Open
ReemaAlzaid wants to merge 2 commits into
apache:mainfrom
ReemaAlzaid:fix-pyarrow
Open

[GLUTEN-12280][VL] Fix Spark 4 Arrow Python UDF stream writer#12345
ReemaAlzaid wants to merge 2 commits into
apache:mainfrom
ReemaAlzaid:fix-pyarrow

Conversation

@ReemaAlzaid

Copy link
Copy Markdown
Contributor

What changes are proposed in this pull request?

Fixes #12280.

Fix Spark 4 Arrow Python UDF execution with the Velox backend by keeping the Arrow stream writer alive across input batches instead of reopening the IPC stream per batch.

Also adds a regression test for Arrow Python UDF over Parquet scan

How was this patch tested?

Added ArrowEvalPythonExecSuite coverage.

Verified locally on Spark 4.0.2 / Scala 2.13 / linux aarch64. The repro uses ColumnarArrowPythonRunner, returns max(ship_len) = 7, and no longer fails with Invalid IPC stream

@github-actions github-actions Bot added the VELOX label Jun 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[VL] pyarrow UDF is broken

1 participant