Description
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
The fact ExecutionPlan::execute
is async seems to be a very common source of bugs:
- CrossJoin Evaluates In ExecutionPlan::execute #2306
- SortExec No Longer Streams Correctly #1939
- HashJoinExec Evaluates in
ExecutionPlan::execute
#2173
On a holistic level I can't see why it needs to be async, given it returns an async stream which can/should contain any deferred computations, as opposed to doing them synchronously with the plan's construction.
Describe the solution you'd like
I would like to make ExecutionPlan::execute
sync, which will in turn make ExecutionPlan
itself sync
i.e. change the signature from
async fn execute(
&self,
partition: usize,
context: Arc<TaskContext>,
) -> Result<SendableRecordBatchStream>;
To
fn execute(
&self,
partition: usize,
context: Arc<TaskContext>,
) -> Result<SendableRecordBatchStream>
Describe alternatives you've considered
We could not do this
Additional context
The new scheduler has a slightly ugly hack to workaround this - https://github.com/apache/arrow-datafusion/pull/2226/files#diff-b3e57d18925cb116e868da5f997735bdf6f937d53635e85bc227c4e01a2e2b33R138
This will likely help with future work to de-asyncify the physical plan - see #2199
It may also help with compile times, and error messages, both downsides of using async_trait