What's the difference of this project with datafusion-ray and datafusion-ballista? #161

zuston · 2025-09-29T02:23:13Z

zuston
Sep 29, 2025

Thanks for your great work on this project—it’s impressive to see how rapidly it’s evolving.
Could you also clarify how this project differs from datafusion-ray and datafusion-ballista?

zuston · 2025-09-29T08:22:20Z

zuston
Sep 29, 2025
Author

cc @gabotechs

0 replies

gabotechs · 2025-09-29T10:59:14Z

gabotechs
Sep 29, 2025
Maintainer

One of the core differences is that this project is not meant to provide any executable, service or binary that can be configured and deployed to an environment and start executing distributed queries. Instead, it aims to provide a library with the building blocks for enhancing DataFusion with distributed capabilities.

Another difference is that this project aims to be as close as possible to DataFusion in its execution model, maintaining a pull based approach very similar to how vanilla DataFusion works, but that happens to stream data over the network.

This project also does not materialize intermediate results across network boundaries. The idea is to stream data across workers in a zero-copy manner as efficiently as possible, even if that implies that other features like checkpointing or subplan retries are going to be way harder to implement.

4 replies

zuston Sep 30, 2025
Author

One of the core differences is that this project is not meant to provide any executable, service or binary that can be configured and deployed to an environment and start executing distributed queries. Instead, it aims to provide a library with the building blocks for enhancing DataFusion with distributed capabilities.

If having this ability by this project, we could implement the distributed sql engine more easiler such as daft did.

Another difference is that this project aims to be as close as possible to DataFusion in its execution model, maintaining a pull based approach very similar to how vanilla DataFusion works, but that happens to stream data over the network.

The execution mode looks the MPP system, like trino did.

gabotechs Sep 30, 2025
Maintainer

If having this ability by this project, we could implement the distributed sql engine more easiler such as daft did

Yes, this is one the core principles of the project, to ship the building blocks, not the solution

gabotechs Sep 30, 2025
Maintainer

The execution mode looks the MPP system, like trino did.

Yes, the execution model is more similar to Trino than to Ballista or Spark for example, right now this is on purpose, but I'd say the door is open to evolutions in this execution model

gabotechs Sep 30, 2025
Maintainer

we could implement the distributed sql engine more easiler

From your message I understand you might be interested in bringing distributed capabilities to your system. Can I ask what kind of data sources are you querying? (parquet, custom, etc...) and what kind of queries would you be doing? (analytical purposes, observability, etc...)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

What's the difference of this project with datafusion-ray and datafusion-ballista? #161

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 4 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

What's the difference of this project with datafusion-ray and datafusion-ballista? #161

Uh oh!

zuston Sep 29, 2025

Replies: 2 comments · 4 replies

Uh oh!

zuston Sep 29, 2025 Author

Uh oh!

gabotechs Sep 29, 2025 Maintainer

Uh oh!

zuston Sep 30, 2025 Author

Uh oh!

gabotechs Sep 30, 2025 Maintainer

Uh oh!

gabotechs Sep 30, 2025 Maintainer

Uh oh!

gabotechs Sep 30, 2025 Maintainer

zuston
Sep 29, 2025

Replies: 2 comments 4 replies

zuston
Sep 29, 2025
Author

gabotechs
Sep 29, 2025
Maintainer

zuston Sep 30, 2025
Author

gabotechs Sep 30, 2025
Maintainer

gabotechs Sep 30, 2025
Maintainer

gabotechs Sep 30, 2025
Maintainer