[SPARK-56661] Implementing UDFWorkerManager for new UDF worker sessions by sven-weber-db · Pull Request #55712 · apache/spark

sven-weber-db · 2026-05-06T14:05:14Z

What changes were proposed in this pull request?

This PR implements a UDFWorkerManager class in the new /udf package that was initiated by SPIP SPARK-55278. The purpose of the new Manager class is to provide a single entry-point for Spark with which a UDF session to a external UDF worker can be created, based on a WorkerSpecification instance. This manager and entry-point will be used by follow-up PRs to implement new, language agnostic Catalyst nodes.

Why are the changes needed?

The UDFWorkerManager serves two main purposes:

Provide a single, unified entry-point to Spark for UDF worker/session creation
Implement the management of UDF WorkerDispachter classes - depending on the UDFWorkerSpecification they are created for. This is required because the newly proposed UDF framework from SPIP SPARK-55278, enables clients to specify different UDF dispatchers for their UDFs. This implies:

2.1. Multiple, different dispatchers can exist at the same time
-> The right one needs to be selected to create a UDF session
2.2. Dispatcher lifetime needs to be managed
-> Dispatchers and their resources need to be cleaned-up if they are no longer needed by clients

Does this PR introduce any user-facing change?

No - All changes are marked as Experimental and not yet consumed.

How was this patch tested?

New unit-tests where added for the changes in the UDFWorkerManager and WorkerSession

Was this patch authored or co-authored using generative AI tooling?

Partially. However, the code was manually reviewed and adjusted.

haiyangsun-db · 2026-05-07T02:06:46Z

+import org.apache.spark.udf.worker.UDFWorkerSpecification
+import org.apache.spark.udf.worker.core.{UDFWorkerManager, WorkerDispatcher, WorkerLogger}
+
+class DirectUDFWorkerManager(


The manager may not care about if the backend of a dispatcher. I suggest:

Call it DispatcherManager, as it is a central place to hold the dispatchers

instead of creating the dispatcher here, register a created dispatcher from callsite, so as this manager does not have to care about the creation logic or the backend of the dispatcher.

This seems to be a dead class not used anywhere, let's remove it

This seems to be a dead class not used anywhere, let's remove it

This class is the class that would be consumed in SparkEnv as the current implementation of the DispatcherManager spawning direct workers. I agree it is not yet complete and its implementation will need to be changed once the gRPC protocol lands. However, it will have to exist. Therefore, I would propose to keep it with the current todo and to replace the implementation once your changes land.

Call it DispatcherManager, as it is a central place to hold the dispatchers

Ok, happy to rename it.

Instead of creating the dispatcher here, register a created dispatcher from callsite

I don't think this works as dispatchers depend on the workerSpec, which is a runtime value. Therefore, we cannot pass one instance of a dispatcher but we need to be able to generate different instances at runtime. We could introduce a DispatcherFactory as an additional abstraction that is passed to the DispatcherManager instead of using subclasses, if you prefer this.

haiyangsun-db · 2026-05-07T02:09:54Z

+// https://github.com/apache/spark/pull/55657
+
+@Experimental
+private[direct] class SimpleWorkerConnection(


let's wait for the other PR to land and we can avoid touching this part of logic in this PR.

haiyangsun-db · 2026-05-07T02:11:33Z

+
+  // Must be called while holding `lock`.
+  private def handleSessionTermination(
+    workerSpec: UDFWorkerSpecification


maybe we shall also pass the session object here?

sven-weber-db force-pushed the sven-weber_data/SPARK-56661-udf-changes branch from d4409b8 to 83e1033 Compare May 6, 2026 14:12

[SPARK-56661] Implementing UDFWorkerManager for new UDF worker sessions

184fc44

sven-weber-db force-pushed the sven-weber_data/SPARK-56661-udf-changes branch from 83e1033 to 184fc44 Compare May 6, 2026 14:25

sven-weber-db changed the title ~~[SPARK-56324] Implementing UDFWorkerManager for new UDF worker sessions~~ [SPARK-56661] Implementing UDFWorkerManager for new UDF worker sessions May 6, 2026

sven-weber-db marked this pull request as ready for review May 6, 2026 14:43

haiyangsun-db reviewed May 7, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-56661] Implementing UDFWorkerManager for new UDF worker sessions#55712

[SPARK-56661] Implementing UDFWorkerManager for new UDF worker sessions#55712
sven-weber-db wants to merge 1 commit intoapache:masterfrom
sven-weber-db:sven-weber_data/SPARK-56661-udf-changes

sven-weber-db commented May 6, 2026 •

edited

Loading

Uh oh!

haiyangsun-db May 7, 2026

Uh oh!

haiyangsun-db May 7, 2026

Uh oh!

sven-weber-db May 7, 2026 •

edited

Loading

Uh oh!

haiyangsun-db May 7, 2026

Uh oh!

haiyangsun-db May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sven-weber-db commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

haiyangsun-db May 7, 2026

Choose a reason for hiding this comment

Uh oh!

haiyangsun-db May 7, 2026

Choose a reason for hiding this comment

Uh oh!

sven-weber-db May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

haiyangsun-db May 7, 2026

Choose a reason for hiding this comment

Uh oh!

haiyangsun-db May 7, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sven-weber-db commented May 6, 2026 •

edited

Loading

sven-weber-db May 7, 2026 •

edited

Loading