Skip to content

Conversation

@codetyri0n
Copy link
Contributor

Which issue does this PR close?

Rationale for this change

This PR makes the extract expression timezone aware. The configuration of time zone (example - SET datafusion.execution.time_zone = '+04:00';) is now recognised and the extract statement correspondingly returns the right results through date_part.

What changes are included in this PR?

Are these changes tested?

Yes (slt)

Are there any user-facing changes?

Yes

@github-actions github-actions bot added sqllogictest SQL Logic Tests (.slt) functions Changes to functions implementation labels Oct 31, 2025
@codetyri0n
Copy link
Contributor Author

codetyri0n commented Oct 31, 2025

CC: @Omega359 @comphead

@Omega359
Copy link
Contributor

I haven't had a chance to go over this in detail but can we add some tests to the .slt to test where the timestamp is being provided with a timezone? I didn't see any from a quick look.

@github-actions github-actions bot added the sql SQL Planner label Nov 1, 2025
@codetyri0n codetyri0n changed the title Feat: Make extract (date_part) timezone aware Feat: Make extract SQL expression timezone aware Nov 1, 2025
@github-actions github-actions bot added the documentation Improvements or additions to documentation label Nov 1, 2025
@codetyri0n
Copy link
Contributor Author

I haven't had a chance to go over this in detail but can we add some tests to the .slt to test where the timestamp is being provided with a timezone? I didn't see any from a quick look.

with the introduction of the tests recommended, it led to making extract independent from date_part --> it does not use the date_part udf. Lmk if you have any concerns.
Summarizing it here:

  • date_part AND extract are now timezone config efficient
  • extract is now registered as a separate datetime function
  • corresponding changes made to mod.rs files and planner
  • slt files and the test cases impacted have been modified accordingly (verified locally)

@Omega359
Copy link
Contributor

Omega359 commented Nov 2, 2025

I'll try and review this tomorrow though reading that extract was, err, extracted to a new function is a surprise.

let offset_hours = if tz_str.as_ref() == "+00:00" {
0
} else {
let sign = if tz_str.starts_with('+') { 1i32 } else { -1i32 };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about named timezones like Europe/Brussels ? Are they intentionally ignored here ?

@Omega359
Copy link
Contributor

Omega359 commented Nov 9, 2025

I just don't see the need for having extract being another function in this PR.

@codetyri0n
Copy link
Contributor Author

I just don't see the need for having extract being another function in this PR.

Little late to realise the overdo myself 😅. However I have brought it back within date_part as before and have also added more robust test cases recommended (like improper offsets) with code addition/tweaks for the same.

I am unclear on how to make named timezones like 'Europe/Brussels' work and have for now, commented out that specific test case. I feel we can push this as part of the latest release since this is a candidate and then get started on making the 'named timezones' scenario work with some help, if it is within the scope.

}
} else if let Timestamp(time_unit, None) = array.data_type() {
// For naive timestamps, interpret in session timezone
let tz: Tz = config.execution.time_zone.as_str().parse().map_err(|_| {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am quite stumped at this error that the CI is throwing:
alternative given by the compiler and the ones I came up with do not work locally. Would be great if someone could lend a hand 😔 (i've considered workarounds with as_ref and as_deref to no avail)

Copy link
Contributor Author

@codetyri0n codetyri0n Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is confusing since the commits with extract as a separate function had this and it seemed to be alright with the CI :

 let tz = match config.execution.time_zone.parse::<Tz>() {
                Ok(tz) => tz,
                Err(_) => return exec_err!("Invalid timezone"),

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am quite stumped at this error that the CI is throwing: alternative given by the compiler and the ones I came up with do not work locally. Would be great if someone could lend a hand 😔 (i've considered workarounds with as_ref and as_deref to no avail)

config.execution.time_zone was changed to an Option in #18359 so the .as_str() call doesn't exist on Option

adriangb and others added 3 commits November 11, 2025 16:17
Removes a downcast match in favor of use of the trait. This mirrors the
changes to DataSourceExec to use partition_statistics instead of
statistics from apache#15852
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123.
-->

- Closes apache#16244

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

Support `output_bytes` in `BaselineMetrics` (a common metrics set for
almost all operators)

```
DataFusion CLI v50.3.0
> explain analyze select * from generate_series(1, 1000000) as t1(v1) order by v1 desc;
+-------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type         | plan                                                                                                                                                                                                            |
+-------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Plan with Metrics | SortExec: expr=[v1@0 DESC], preserve_partitioning=[false], metrics=[output_rows=1000000, elapsed_compute=96.421534ms, output_bytes=7.6 MB, spill_count=0, spilled_bytes=0.0 B, spilled_rows=0, batches_split=0] |
|                   |   ProjectionExec: expr=[value@0 as v1], metrics=[output_rows=1000000, elapsed_compute=34.125µs, output_bytes=7.7 MB]                                                                                            |
|                   |     LazyMemoryExec: partitions=1, batch_generators=[generate_series: start=1, end=1000000, batch_size=8192], metrics=[output_rows=1000000, elapsed_compute=2.262626ms, output_bytes=7.7 MB]                     |
|                   |                                                                                                                                                                                                                 |
+-------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.080 seconds.
```

Note it might overestimate memory due to a well-known issue. See the PR
snippet for details
```rs

    /// Memory usage of all output batches.
    ///
    /// Note: This value may be overestimated. If multiple output `RecordBatch`
    /// instances share underlying memory buffers, their sizes will be counted
    /// multiple times.
    /// Issue: <apache#16841>
    output_bytes: Count,
```

I think this metric provides valuable insight, so it's better for it to
overestimate than not exist at all.

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->
1. Add `output_bytes` to `BaselineMetrics`, and it's set to `summary`
analyze level. (see config `datafusion.explain.analyze_level` for
details)
2. This metrics will be automatically tracked through `record_poll()`
API, which is a common interface most operators uses when a new output
batch is generated.

## Are these changes tested?
UT
<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
3. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
…/floor (apache#18265)

## Which issue does this PR close?

<!--
-->

- Closes apache#18175 

## Rationale for this change
<!--
-->
The Ceil/Floor calls via SQL was being parsed such that they were taking
2 arguments instead of 1, the second of which is not currently needed
and the second argument was being ignored and passed silently.



## What changes are included in this PR?

<!--
-->

The second parameter(`field`) which was being passed if is of the
`CeilFloorKind` enum from `sqlparser` crate . Neither of the enum's two
variants (`DateTimeField` and `Scale`)are being implemented hence they
have been ignored with apporpriate error type and only succeeds if the
`DateTimeField` has `NoDateTime` variant i,e it is treated as empty.

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->
All Unit Tests pass successfully.

---------

Co-authored-by: Andrew Lamb <[email protected]>
cj-zhukov and others added 13 commits November 11, 2025 16:17
## Which issue does this PR close?
This PR is for consolidating all the `custom_data_source` examples into
a single example binary. We are agreed on the pattern and we can apply
it to the remaining examples

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123.
-->

- part of #apache#18142.

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->

---------

Co-authored-by: Sergey Zhukov <[email protected]>
Co-authored-by: Andrew Lamb <[email protected]>
## Which issue does this PR close?

 - apache#17211

It's not yet clear to me if this will fully close the above issue, or if
it's just the first step. I think there may be more work to do, so I'm
not going to have this auto-close the issue.

## Rationale for this change

tl;dr of the issue: normalizing the access pattern(s) for objects for
partitioned tables should not only reduce the number of requests to a
backing object store, but will also allow any existing and/or future
caching mechanisms to apply equally to both directory-partitioned and
flat tables.

List request on `main`:
```sql
DataFusion CLI v50.2.0
> \object_store_profiling summary
ObjectStore Profile mode set to Summary
> CREATE EXTERNAL TABLE overture_partitioned
STORED AS PARQUET LOCATION 's3://overturemaps-us-west-2/release/2025-09-24.0/';
0 row(s) fetched.
Elapsed 37.236 seconds.

Object Store Profiling
Instrumented Object Store: instrument_mode: Summary, inner: AmazonS3(overturemaps-us-west-2)
Summaries:
+-----------+----------+-----+-----+-----+-----+-------+
| Operation | Metric   | min | max | avg | sum | count |
+-----------+----------+-----+-----+-----+-----+-------+
| List      | duration |     |     |     |     | 1     |
| List      | size     |     |     |     |     | 1     |
+-----------+----------+-----+-----+-----+-----+-------+
Instrumented Object Store: instrument_mode: Summary, inner: AmazonS3(overturemaps-us-west-2)
Summaries:
+-----------+----------+-----------+-----------+-------------+-------------+-------+
| Operation | Metric   | min       | max       | avg         | sum         | count |
+-----------+----------+-----------+-----------+-------------+-------------+-------+
| Get       | duration | 0.044411s | 0.338399s | 0.104535s   | 162.133179s | 1551  |
| Get       | size     | 8 B       | 1285059 B | 338457.56 B | 524947683 B | 1551  |
| List      | duration |           |           |             |             | 3     |
| List      | size     |           |           |             |             | 3     |
+-----------+----------+-----------+-----------+-------------+-------------+-------+
> select count(*) from overture_partitioned;
+------------+
| count(*)   |
+------------+
| 4219677254 |
+------------+
1 row(s) fetched.
Elapsed 40.061 seconds.

Object Store Profiling
Instrumented Object Store: instrument_mode: Summary, inner: AmazonS3(overturemaps-us-west-2)
Summaries:
+-----------+----------+-----------+-----------+-------------+-------------+-------+
| Operation | Metric   | min       | max       | avg         | sum         | count |
+-----------+----------+-----------+-----------+-------------+-------------+-------+
| Get       | duration | 0.042554s | 0.453125s | 0.103147s   | 159.980835s | 1551  |
| Get       | size     | 8 B       | 1285059 B | 338457.56 B | 524947683 B | 1551  |
| List      | duration | 0.043498s | 0.196298s | 0.092462s   | 2.034174s   | 22    |
| List      | size     |           |           |             |             | 22    |
+-----------+----------+-----------+-----------+-------------+-------------+-------+
> select count(*) from overture_partitioned;
+------------+
| count(*)   |
+------------+
| 4219677254 |
+------------+
1 row(s) fetched.
Elapsed 0.924 seconds.

Object Store Profiling
Instrumented Object Store: instrument_mode: Summary, inner: AmazonS3(overturemaps-us-west-2)
Summaries:
+-----------+----------+-----------+-----------+-----------+-----------+-------+
| Operation | Metric   | min       | max       | avg       | sum       | count |
+-----------+----------+-----------+-----------+-----------+-----------+-------+
| List      | duration | 0.040526s | 0.161407s | 0.092792s | 2.041431s | 22    |
| List      | size     |           |           |           |           | 22    |
+-----------+----------+-----------+-----------+-----------+-----------+-------+
>
```

List requests for this PR:
```sql
DataFusion CLI v50.2.0
> \object_store_profiling summary
ObjectStore Profile mode set to Summary
> CREATE EXTERNAL TABLE overture_partitioned
STORED AS PARQUET LOCATION 's3://overturemaps-us-west-2/release/2025-09-24.0/';
0 row(s) fetched.
Elapsed 33.962 seconds.

Object Store Profiling
Instrumented Object Store: instrument_mode: Summary, inner: AmazonS3(overturemaps-us-west-2)
Summaries:
+-----------+----------+-----+-----+-----+-----+-------+
| Operation | Metric   | min | max | avg | sum | count |
+-----------+----------+-----+-----+-----+-----+-------+
| List      | duration |     |     |     |     | 1     |
| List      | size     |     |     |     |     | 1     |
+-----------+----------+-----+-----+-----+-----+-------+
Instrumented Object Store: instrument_mode: Summary, inner: AmazonS3(overturemaps-us-west-2)
Summaries:
+-----------+----------+-----------+-----------+-------------+-------------+-------+
| Operation | Metric   | min       | max       | avg         | sum         | count |
+-----------+----------+-----------+-----------+-------------+-------------+-------+
| Get       | duration | 0.043832s | 0.342730s | 0.110505s   | 171.393509s | 1551  |
| Get       | size     | 8 B       | 1285059 B | 338457.56 B | 524947683 B | 1551  |
| List      | duration |           |           |             |             | 3     |
| List      | size     |           |           |             |             | 3     |
+-----------+----------+-----------+-----------+-------------+-------------+-------+
> select count(*) from overture_partitioned;
+------------+
| count(*)   |
+------------+
| 4219677254 |
+------------+
1 row(s) fetched.
Elapsed 38.119 seconds.

Object Store Profiling
Instrumented Object Store: instrument_mode: Summary, inner: AmazonS3(overturemaps-us-west-2)
Summaries:
+-----------+----------+-----------+-----------+-------------+-------------+-------+
| Operation | Metric   | min       | max       | avg         | sum         | count |
+-----------+----------+-----------+-----------+-------------+-------------+-------+
| Get       | duration | 0.043186s | 0.296394s | 0.099681s   | 154.605286s | 1551  |
| Get       | size     | 8 B       | 1285059 B | 338457.56 B | 524947683 B | 1551  |
| List      | duration |           |           |             |             | 1     |
| List      | size     |           |           |             |             | 1     |
+-----------+----------+-----------+-----------+-------------+-------------+-------+
> select count(*) from overture_partitioned;
+------------+
| count(*)   |
+------------+
| 4219677254 |
+------------+
1 row(s) fetched.
Elapsed 0.815 seconds.

Object Store Profiling
Instrumented Object Store: instrument_mode: Summary, inner: AmazonS3(overturemaps-us-west-2)
Summaries:
+-----------+----------+-----+-----+-----+-----+-------+
| Operation | Metric   | min | max | avg | sum | count |
+-----------+----------+-----+-----+-----+-----+-------+
| List      | duration |     |     |     |     | 1     |
| List      | size     |     |     |     |     | 1     |
+-----------+----------+-----+-----+-----+-----+-------+
>
```

List operations
| Action | `main` | this PR |
| ---- | ---- | ---- |
| Create Table | 3 | 3 |
| Cold-cache Query | 22 | 1 |
| Warm-cache Query | 22 | 1 |

## What changes are included in this PR?

- Refactored helpers related to listing, discovering, and pruning
objects based on partitions to normalize the strategy between
partitioned and flat tables

## Are these changes tested?

Yes. The internal methods that have been modified are covered by
existing tests.

## Are there any user-facing changes?

No

## Additional Notes

I want to surface that I believe there is a chance for a performance
_regression_ for certain queries against certain tables. One performance
related mechanism the existing code implements, but this code currently
omits, is (potentially) reducing the number of partitions listed based
on query filters. In order for the existing code to exercise this
optimization the query filters must contain all the path elements of a
subdirectory as column filters. E.g.

Given a table with a directory-partitioning structure like: 
```
path/to/table/a=1/b=2/c=3/data.parquet
```
This query:
```sql
select count(*) from table where a=1 and b=2;
```
Will result in listing the following path:
```
LIST: path/to/table/a=1/b=2/
```

Whereas this query:
```sql
select count(*) from table where b=2;
```
Will result in listing the following path:
```
LIST: path/to/table/
```

I believe the real-world impact of this omission is likely minimal, at
least when using high-latency storage such as S3 or other object stores,
especially considering the existing implementation is likely to execute
multiple sequential `LIST` operations due to its breadth-first search
implementation. The most likely configuration for a table that would be
negatively impacted would be a table that holds many thousands of
underlying objects (most cloud stores return recursive list requests
with page sizes of many hundreds to thousands of objects) with a
relatively shallow partition structure. I may be able to find or build a
dataset that fulfills these criteria to test this assertion if there's
concern about it.

I believe we could also augment the existing low-level `object_store`
interactions to allow listing a prefix on a table, which would allow the
same pruning of list operations with the code in this PR. The downside
to this approach is it either complicates future caching efforts, or
leads to cache fragmentation in a simpler cache implementation. I didn't
include these changes in this PR to avoid the change set being too
large.

##
cc @alamb

---------

Co-authored-by: Andrew Lamb <[email protected]>
…18491)

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123.
-->

- Closes apache#17027 

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->
`output_batches` should be a common metric in all operators, thus should
ideally be added to `BaselineMetrics`
```
> explain analyze select * from generate_series(1, 1000000) as t1(v1) order by v1 desc;
+-------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type         | plan                                                                                                                                                                                                                                 |
+-------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Plan with Metrics | SortExec: expr=[v1@0 DESC], preserve_partitioning=[false], metrics=[output_rows=1000000, elapsed_compute=535.320324ms, output_bytes=7.6 MB, output_batches=123, spill_count=0, spilled_bytes=0.0 B, spilled_rows=0, batches_split=0] |
|                   |   ProjectionExec: expr=[value@0 as v1], metrics=[output_rows=1000000, elapsed_compute=208.379µs, output_bytes=7.7 MB, output_batches=123]                                                                                            |
|                   |     LazyMemoryExec: partitions=1, batch_generators=[generate_series: start=1, end=1000000, batch_size=8192], metrics=[output_rows=1000000, elapsed_compute=15.924291ms, output_bytes=7.7 MB, output_batches=123]                     |
|                   |                                                                                                                                                                                                                                      |
+-------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.492 second
```

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->
- Added `output_batches` into `BaselineMetrics` with `DEV` MetricType 
- Tracked through `record_poll()` API
- Changes are similar to apache#18268
- Refactored `assert_metrics` macro to take multiple metrics strings for
substring check
- Added `output_bytes` and `output_batches` tracking in `TopK` operator
- Added `baseline` metrics for `RepartitionExec`

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->
Added UT

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
Changes in the `EXPLAIN ANALYZE` output, `output_batches` will be added
to `metrics=[...]`
…pr (apache#18532)

## Which issue does this PR close?
- Closes apache#18504.

## Rationale for this change
Followed suggestions to not update any public-facing APIs and put the
lint rule in the appropriate spot.

## What changes are included in this PR?
* Add `#![deny(clippy::needless_pass_by_value)]` and `#![cfg_attr(test,
allow(clippy::needless_pass_by_value))]` to `lib.rs`.
* Add `#[allow(clippy::needless_pass_by_value)]` to public functions
* fix `rewrite_in_terms_of_projection()` and
`get_exprs_except_skipped()` to use references per the lint suggestion

## Are these changes tested?
Yes, though the same test failed even without changes to the public
APIs:
`test expr_rewriter::order_by::test::rewrite_sort_cols_by_agg_alias ...
FAILED`
I'll append the logs for your convenience:
```
failures:

---- expr_rewriter::order_by::test::rewrite_sort_cols_by_agg_alias stdout ----
running: 'c1 --> c1  -- column *named* c1 that came out of the projection, (not t.c1)'
running: 'min(c2) --> "min(c2)" -- (column *named* "min(t.c2)"!)'

thread 'expr_rewriter::order_by::test::rewrite_sort_cols_by_agg_alias' (27524241) panicked at datafusion/expr/src/expr_rewriter/order_by.rs:308:13:
assertion `left == right` failed: 

input:Sort { expr: AggregateFunction(AggregateFunction { func: AggregateUDF { inner: Min { name: "min", signature: Signature { type_signature: VariadicAny, volatility: Immutable, parameter_names: None } } }, params: AggregateFunctionParams { args: [Column(Column { relation: None, name: "c2" })], distinct: false, filter: None, order_by: [], null_treatment: None } }), asc: true, nulls_first: true }
rewritten:Sort { expr: Column(Column { relation: None, name: "min(t.c2)" }), asc: true, nulls_first: true }
expected:Sort { expr: Column(Column { relation: Some(Bare { table: "min(t" }), name: "c2)" }), asc: true, nulls_first: true }

  left: Sort { expr: Column(Column { relation: None, name: "min(t.c2)" }), asc: true, nulls_first: true }
 right: Sort { expr: Column(Column { relation: Some(Bare { table: "min(t" }), name: "c2)" }), asc: true, nulls_first: true }
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

failures:
    expr_rewriter::order_by::test::rewrite_sort_cols_by_agg_alias

```
## Are there any user-facing changes?
No, all modification were constrained to internal APIs.

---------

Co-authored-by: Yongting You <[email protected]>
## Which issue does this PR close?

## Rationale for this change

get_field doesn't support nested key

## What changes are included in this PR?

support nested key

## Are these changes tested?

UT

## Are there any user-facing changes?

No

---------

Co-authored-by: Andrew Lamb <[email protected]>
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123.
-->

- Closes apache#16688.

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->


Currently Datafusion can only read Arrow files if the're in the File
format, not the Stream format. I work with a bunch of Stream format
files and wanted native support.

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

To accomplish the above, this PR splits the Arrow datasource into two
separate implementations (`ArrowStream*` and `ArrowFile*`) with a facade
on top to differentiate between the formats at query planning time.

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

Yes, there are end-to-end sqllogictests along with tests for the changes
within datasource-arrow.

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->

Technically yes, in that we support a new format now. I'm not sure which
documentation would need to be updated?

---------

Co-authored-by: Martin Grigorov <[email protected]>
…che#18581)

Bumps
[taiki-e/install-action](https://github.com/taiki-e/install-action) from
2.62.47 to 2.62.49.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/taiki-e/install-action/releases">taiki-e/install-action's
releases</a>.</em></p>
<blockquote>
<h2>2.62.49</h2>
<ul>
<li>
<p>Update <code>cargo-binstall@latest</code> to 1.15.11.</p>
</li>
<li>
<p>Update <code>cargo-auditable@latest</code> to 0.7.2.</p>
</li>
<li>
<p>Update <code>vacuum@latest</code> to 0.20.2.</p>
</li>
</ul>
<h2>2.62.48</h2>
<ul>
<li>
<p>Update <code>mise@latest</code> to 2025.11.3.</p>
</li>
<li>
<p>Update <code>cargo-audit@latest</code> to 0.22.0.</p>
</li>
<li>
<p>Update <code>vacuum@latest</code> to 0.20.1.</p>
</li>
<li>
<p>Update <code>uv@latest</code> to 0.9.8.</p>
</li>
<li>
<p>Update <code>cargo-udeps@latest</code> to 0.1.60.</p>
</li>
<li>
<p>Update <code>zizmor@latest</code> to 1.16.3.</p>
</li>
</ul>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/taiki-e/install-action/blob/main/CHANGELOG.md">taiki-e/install-action's
changelog</a>.</em></p>
<blockquote>
<h1>Changelog</h1>
<p>All notable changes to this project will be documented in this
file.</p>
<p>This project adheres to <a href="https://semver.org">Semantic
Versioning</a>.</p>
<!-- raw HTML omitted -->
<h2>[Unreleased]</h2>
<h2>[2.62.49] - 2025-11-09</h2>
<ul>
<li>
<p>Update <code>cargo-binstall@latest</code> to 1.15.11.</p>
</li>
<li>
<p>Update <code>cargo-auditable@latest</code> to 0.7.2.</p>
</li>
<li>
<p>Update <code>vacuum@latest</code> to 0.20.2.</p>
</li>
</ul>
<h2>[2.62.48] - 2025-11-08</h2>
<ul>
<li>
<p>Update <code>mise@latest</code> to 2025.11.3.</p>
</li>
<li>
<p>Update <code>cargo-audit@latest</code> to 0.22.0.</p>
</li>
<li>
<p>Update <code>vacuum@latest</code> to 0.20.1.</p>
</li>
<li>
<p>Update <code>uv@latest</code> to 0.9.8.</p>
</li>
<li>
<p>Update <code>cargo-udeps@latest</code> to 0.1.60.</p>
</li>
<li>
<p>Update <code>zizmor@latest</code> to 1.16.3.</p>
</li>
</ul>
<h2>[2.62.47] - 2025-11-05</h2>
<ul>
<li>
<p>Update <code>vacuum@latest</code> to 0.20.0.</p>
</li>
<li>
<p>Update <code>cargo-nextest@latest</code> to 0.9.111.</p>
</li>
<li>
<p>Update <code>cargo-shear@latest</code> to 1.6.2.</p>
</li>
</ul>
<h2>[2.62.46] - 2025-11-04</h2>
<ul>
<li>
<p>Update <code>vacuum@latest</code> to 0.19.5.</p>
</li>
<li>
<p>Update <code>syft@latest</code> to 1.37.0.</p>
</li>
<li>
<p>Update <code>mise@latest</code> to 2025.11.2.</p>
</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/taiki-e/install-action/commit/44c6d64aa62cd779e873306675c7a58e86d6d532"><code>44c6d64</code></a>
Release 2.62.49</li>
<li><a
href="https://github.com/taiki-e/install-action/commit/3a701df4c2a3e11596a1c5a65eb0e69c79ee4a82"><code>3a701df</code></a>
Update <code>cargo-binstall@latest</code> to 1.15.11</li>
<li><a
href="https://github.com/taiki-e/install-action/commit/4242e04eb80c4492261074808c18d638aa247de0"><code>4242e04</code></a>
Update <code>cargo-auditable@latest</code> to 0.7.2</li>
<li><a
href="https://github.com/taiki-e/install-action/commit/3df5533ef842d100d27dbd43c2fbd8aa0cccddcc"><code>3df5533</code></a>
Update <code>vacuum@latest</code> to 0.20.2</li>
<li><a
href="https://github.com/taiki-e/install-action/commit/e797ba6a25dbd8669057e123b02812e16138589e"><code>e797ba6</code></a>
Release 2.62.48</li>
<li><a
href="https://github.com/taiki-e/install-action/commit/bcf91e02acc5cc0ed84eac8d763b7328a3c7cd3f"><code>bcf91e0</code></a>
Update <code>mise@latest</code> to 2025.11.3</li>
<li><a
href="https://github.com/taiki-e/install-action/commit/e78113b60c103d89241857d78e2610df1305cffd"><code>e78113b</code></a>
Update <code>cargo-audit@latest</code> to 0.22.0</li>
<li><a
href="https://github.com/taiki-e/install-action/commit/0ef486444ebe65689986d037f4b61d8292b5a4ed"><code>0ef4864</code></a>
Update <code>vacuum@latest</code> to 0.20.1</li>
<li><a
href="https://github.com/taiki-e/install-action/commit/5eda7b198531ad7024688974dd308f7ea0bd21aa"><code>5eda7b1</code></a>
Update <code>uv@latest</code> to 0.9.8</li>
<li><a
href="https://github.com/taiki-e/install-action/commit/3853a413e6de756806bca9b522388e2d2b5abbd6"><code>3853a41</code></a>
Update <code>cargo-udeps@latest</code> to 0.1.60</li>
<li>Additional commits viewable in <a
href="https://github.com/taiki-e/install-action/compare/6f9c7cc51aa54b13cbcbd12f8bbf69d8ba405b4b...44c6d64aa62cd779e873306675c7a58e86d6d532">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=taiki-e/install-action&package-manager=github_actions&previous-version=2.62.47&new-version=2.62.49)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [maturin](https://github.com/pyo3/maturin) from 1.9.6 to 1.10.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/pyo3/maturin/releases">maturin's
releases</a>.</em></p>
<blockquote>
<h2>v1.10.0</h2>
<h2>What's Changed</h2>
<ul>
<li>Fix generated WHEEL Tag metadata to be spec compliant. by <a
href="https://github.com/jsirois"><code>@​jsirois</code></a> in <a
href="https://redirect.github.com/PyO3/maturin/pull/2762">PyO3/maturin#2762</a></li>
<li>Export all Cargo URL metadata items to Python by <a
href="https://github.com/chrysn"><code>@​chrysn</code></a> in <a
href="https://redirect.github.com/PyO3/maturin/pull/2760">PyO3/maturin#2760</a></li>
<li>Update maximum Python version to 3.14 by <a
href="https://github.com/messense"><code>@​messense</code></a> in <a
href="https://redirect.github.com/PyO3/maturin/pull/2763">PyO3/maturin#2763</a></li>
<li>Remove shebang from non-executable <strong>init</strong>.py file by
<a
href="https://github.com/musicinmybrain"><code>@​musicinmybrain</code></a>
in <a
href="https://redirect.github.com/PyO3/maturin/pull/2775">PyO3/maturin#2775</a></li>
<li>Stop warning about missing <code>extension-module</code> feature on
pyo3 0.26+ by <a
href="https://github.com/messense"><code>@​messense</code></a> in <a
href="https://redirect.github.com/PyO3/maturin/pull/2789">PyO3/maturin#2789</a></li>
<li><code>--profile</code> conflicts with <code>--release</code> (and/or
<code>--debug</code>) by <a
href="https://github.com/davidhewitt"><code>@​davidhewitt</code></a> in
<a
href="https://redirect.github.com/PyO3/maturin/pull/2793">PyO3/maturin#2793</a></li>
<li>Bump MSRV to 1.83.0 by <a
href="https://github.com/messense"><code>@​messense</code></a> in <a
href="https://redirect.github.com/PyO3/maturin/pull/2790">PyO3/maturin#2790</a></li>
<li>respect CLI profile over pyproject.toml by <a
href="https://github.com/davidhewitt"><code>@​davidhewitt</code></a> in
<a
href="https://redirect.github.com/PyO3/maturin/pull/2794">PyO3/maturin#2794</a></li>
<li>chore: add FreeBSD 14.3 amd64 sysconfig by <a
href="https://github.com/fleetingbytes"><code>@​fleetingbytes</code></a>
in <a
href="https://redirect.github.com/PyO3/maturin/pull/2805">PyO3/maturin#2805</a></li>
<li>Add Cygwin support by <a
href="https://github.com/lazka"><code>@​lazka</code></a> in <a
href="https://redirect.github.com/PyO3/maturin/pull/2819">PyO3/maturin#2819</a></li>
<li>PyO3: do not add <code>extension-module</code> feature in template
and tutorial by <a href="https://github.com/Tpt"><code>@​Tpt</code></a>
in <a
href="https://redirect.github.com/PyO3/maturin/pull/2821">PyO3/maturin#2821</a></li>
<li>Remove add_directory() from ModuleWriter trait by <a
href="https://github.com/e-nomem"><code>@​e-nomem</code></a> in <a
href="https://redirect.github.com/PyO3/maturin/pull/2824">PyO3/maturin#2824</a></li>
<li>Correct wheel naming when targeting iOS by <a
href="https://github.com/freakboy3742"><code>@​freakboy3742</code></a>
in <a
href="https://redirect.github.com/PyO3/maturin/pull/2827">PyO3/maturin#2827</a></li>
<li>Add support for iOS cross-platform virtual environments by <a
href="https://github.com/freakboy3742"><code>@​freakboy3742</code></a>
in <a
href="https://redirect.github.com/PyO3/maturin/pull/2828">PyO3/maturin#2828</a></li>
<li>add <code>editable-profile</code> option by <a
href="https://github.com/davidhewitt"><code>@​davidhewitt</code></a> in
<a
href="https://redirect.github.com/PyO3/maturin/pull/2826">PyO3/maturin#2826</a></li>
<li>Make sdist reproducible by <a
href="https://github.com/e-nomem"><code>@​e-nomem</code></a> in <a
href="https://redirect.github.com/PyO3/maturin/pull/2831">PyO3/maturin#2831</a></li>
<li>always use &quot;library&quot; mode to generate uniffi bindings by
<a href="https://github.com/davidhewitt"><code>@​davidhewitt</code></a>
in <a
href="https://redirect.github.com/PyO3/maturin/pull/2840">PyO3/maturin#2840</a></li>
<li>If an interpreter is available, use it, even when building ABI3. by
<a
href="https://github.com/freakboy3742"><code>@​freakboy3742</code></a>
in <a
href="https://redirect.github.com/PyO3/maturin/pull/2829">PyO3/maturin#2829</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a href="https://github.com/jsirois"><code>@​jsirois</code></a> made
their first contribution in <a
href="https://redirect.github.com/PyO3/maturin/pull/2762">PyO3/maturin#2762</a></li>
<li><a href="https://github.com/chrysn"><code>@​chrysn</code></a> made
their first contribution in <a
href="https://redirect.github.com/PyO3/maturin/pull/2760">PyO3/maturin#2760</a></li>
<li><a href="https://github.com/ddelange"><code>@​ddelange</code></a>
made their first contribution in <a
href="https://redirect.github.com/PyO3/maturin/pull/2769">PyO3/maturin#2769</a></li>
<li><a href="https://github.com/vvsagar"><code>@​vvsagar</code></a> made
their first contribution in <a
href="https://redirect.github.com/PyO3/maturin/pull/2783">PyO3/maturin#2783</a></li>
<li><a
href="https://github.com/MatthijsKok"><code>@​MatthijsKok</code></a>
made their first contribution in <a
href="https://redirect.github.com/PyO3/maturin/pull/2799">PyO3/maturin#2799</a></li>
<li><a
href="https://github.com/fleetingbytes"><code>@​fleetingbytes</code></a>
made their first contribution in <a
href="https://redirect.github.com/PyO3/maturin/pull/2805">PyO3/maturin#2805</a></li>
<li><a href="https://github.com/linkmauve"><code>@​linkmauve</code></a>
made their first contribution in <a
href="https://redirect.github.com/PyO3/maturin/pull/2811">PyO3/maturin#2811</a></li>
<li><a href="https://github.com/lazka"><code>@​lazka</code></a> made
their first contribution in <a
href="https://redirect.github.com/PyO3/maturin/pull/2819">PyO3/maturin#2819</a></li>
<li><a href="https://github.com/e-nomem"><code>@​e-nomem</code></a> made
their first contribution in <a
href="https://redirect.github.com/PyO3/maturin/pull/2824">PyO3/maturin#2824</a></li>
<li><a
href="https://github.com/freakboy3742"><code>@​freakboy3742</code></a>
made their first contribution in <a
href="https://redirect.github.com/PyO3/maturin/pull/2827">PyO3/maturin#2827</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/PyO3/maturin/compare/v1.9.6...v1.10.0">https://github.com/PyO3/maturin/compare/v1.9.6...v1.10.0</a></p>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/PyO3/maturin/blob/main/Changelog.md">maturin's
changelog</a>.</em></p>
<blockquote>
<h2>[1.10.0]</h2>
<ul>
<li>Add <code>tool.maturin.editable-profile</code> option to override
profile for editable package installations.</li>
<li>Add support for Cygwin.</li>
<li>When building <code>abi3</code> wheels on non-Windows platforms that
aren't cross-compiling, the <code>sysconfigdata</code> of the
interpreter used to run maturin will now be used, rather than a dummy
interpreter.</li>
<li>Allow iOS cross-platform virtual environments, such as those used by
cibuildwheel, to imply an iOS target.</li>
<li>Fix iOS wheel naming to be compliant with PEP 730.</li>
<li>Fix generated WHEEL Tag metadata to be spec compliant.</li>
<li>Fix incorrect warning about missing <code>extension-module</code>
feature on PyO3 0.26+.</li>
<li>Remove <code>add_directory()</code> from ModuleWriter and make it an
implementation detail for the specific impl.</li>
<li>Clear out uid/gid and set deterministic mtime for files in
sdist.</li>
<li>Always use &quot;library&quot; mode to build uniffi bindings.</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/PyO3/maturin/commit/c3093d1c1089a65e4baca7bb98b930ce0b297863"><code>c3093d1</code></a>
release: 1.10.0 (<a
href="https://redirect.github.com/pyo3/maturin/issues/2841">#2841</a>)</li>
<li><a
href="https://github.com/PyO3/maturin/commit/a41bc8654c7753106a50c52fbea6c51f63e18adb"><code>a41bc86</code></a>
If an interpreter is available, use it, even when building ABI3. (<a
href="https://redirect.github.com/pyo3/maturin/issues/2829">#2829</a>)</li>
<li><a
href="https://github.com/PyO3/maturin/commit/e75305205431319e1b9a70b5a76bb84e8bfa60bb"><code>e753052</code></a>
always use &quot;library&quot; mode to generate uniffi bindings (<a
href="https://redirect.github.com/pyo3/maturin/issues/2840">#2840</a>)</li>
<li><a
href="https://github.com/PyO3/maturin/commit/216b643ea45794107922308c71b352f06d27a163"><code>216b643</code></a>
Update manylinux/musllinux policies to the latest main (<a
href="https://redirect.github.com/pyo3/maturin/issues/2836">#2836</a>)</li>
<li><a
href="https://github.com/PyO3/maturin/commit/044ba832245e67924cb950e81597949ae14acb97"><code>044ba83</code></a>
Revert &quot;Upgrade goblin to 0.10&quot; (<a
href="https://redirect.github.com/pyo3/maturin/issues/2837">#2837</a>)</li>
<li><a
href="https://github.com/PyO3/maturin/commit/bb3d629fb79cc41e3fb71a75a86eed599a2d2643"><code>bb3d629</code></a>
Upgrade goblin to 0.10 (<a
href="https://redirect.github.com/pyo3/maturin/issues/2833">#2833</a>)</li>
<li><a
href="https://github.com/PyO3/maturin/commit/837549608af174d232b6f9e15f04ab3e77258bf5"><code>8375496</code></a>
Use <code>serial_test</code> for tests that modifies env vars (<a
href="https://redirect.github.com/pyo3/maturin/issues/2832">#2832</a>)</li>
<li><a
href="https://github.com/PyO3/maturin/commit/9dc2f5fc546d609436805e655b4c02b4ebf9287b"><code>9dc2f5f</code></a>
Make sdist reproducible (<a
href="https://redirect.github.com/pyo3/maturin/issues/2831">#2831</a>)</li>
<li><a
href="https://github.com/PyO3/maturin/commit/685efba876ad23417d506e40f21c8bebacb0c00f"><code>685efba</code></a>
ci: bump to Python 3.14, update runners (<a
href="https://redirect.github.com/pyo3/maturin/issues/2830">#2830</a>)</li>
<li><a
href="https://github.com/PyO3/maturin/commit/57aa6ed9663c70ebd57b78eebe7143b5fa3b0839"><code>57aa6ed</code></a>
add <code>editable-profile</code> option (<a
href="https://redirect.github.com/pyo3/maturin/issues/2826">#2826</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/pyo3/maturin/compare/v1.9.6...v1.10.0">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=maturin&package-manager=pip&previous-version=1.9.6&new-version=1.10.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123.
-->

- Closes #.

## Rationale for this change

`cargo-machete` identifies an unused dependency and this blocks a bunch
of dependabot updates PRs
apache#18580

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
… require code changes. (apache#18586)

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123.
-->

Part of apache#18503

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->
Enforce the lint rule to all crates that are already passing this extra
check, and we don't need further code change on them.

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->

Co-authored-by: Andrew Lamb <[email protected]>
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123.
-->

- Closes apache#18341.
- Closes apache#9370

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

Cases where two RepartitionExec operators appear consecutively in the
plan. This is unneeded overhead that eliminating provides speed ups.

Full Report: [The Physical Optimizer and Fixing Consecutive Repartitions
In the Enforce Distribution
Rule.pdf](https://github.com/user-attachments/files/23420831/The.Physical.Optimizer.and.Fixing.Consecutive.Repartitions.In.the.Enforce.Distribution.Rule.pdf)

Issue Report: [Fixing Consecutive Repartitions In the Enforce
Distribution
Rule.pdf](https://github.com/user-attachments/files/23420880/Fixing.Consecutive.Repartitions.In.the.Enforce.Distribution.Rule.pdf)

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

Change to repartition adding logic in `enforce_distribution.rs`
A ton of test and bench updates to mirror new behavior

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

Yes benchmarked and tested, check report for benchmarks

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->

---------

Co-authored-by: Andrew Lamb <[email protected]>
…8540)

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123.
-->

- Closes apache#18155.

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

Merges the functionality of `CoalesceAsyncExecInput` into
`CoalesceBatches` to remove redundant optimizer logic and simplify batch
coalescing behavior.


## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->
Behavior is covered by existing ``CoalesceBatches and optimizer tests.
## Are there any user-facing changes?
No
<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
@github-actions github-actions bot added development-process Related to development process of DataFusion logical-expr Logical plan and expressions physical-expr Changes to the physical-expr crates optimizer Optimizer rules core Core DataFusion crate substrait Changes to the substrait crate catalog Related to the catalog crate common Related to common crate execution Related to the execution crate proto Related to proto crate datasource Changes to the datasource crate ffi Changes to the ffi crate physical-plan Changes to the physical-plan crate spark labels Nov 11, 2025
@codetyri0n codetyri0n closed this Nov 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

catalog Related to the catalog crate common Related to common crate core Core DataFusion crate datasource Changes to the datasource crate development-process Related to development process of DataFusion documentation Improvements or additions to documentation execution Related to the execution crate ffi Changes to the ffi crate functions Changes to functions implementation logical-expr Logical plan and expressions optimizer Optimizer rules physical-expr Changes to the physical-expr crates physical-plan Changes to the physical-plan crate proto Related to proto crate spark sql SQL Planner sqllogictest SQL Logic Tests (.slt) substrait Changes to the substrait crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Extract sql expression no longer is timezone aware