Skip to content

Conversation

@pan3793
Copy link
Member

@pan3793 pan3793 commented Oct 22, 2025

Backport #52584 to branch-3.5

What changes were proposed in this pull request?

This is the second try of #52474, following the suggestion from cloud-fan

This PR fixes a bug in plannedWrite, where the query has foldable orderings in the partition columns.

CREATE TABLE t (i INT, j INT, k STRING) USING PARQUET PARTITIONED BY (k);

INSERT OVERWRITE t SELECT j AS i, i AS j, '0' as k FROM t0 SORT BY k, i;

The evaluation of FileFormatWriter.orderingMatched fails because SortOrder(Literal) is eliminated by EliminateSorts.

Why are the changes needed?

V1Writes will override the custom sort order when the query output ordering does not satisfy the required ordering. Before SPARK-53707, when the query's output contains literals in partition columns, the judgment produces a false-negative result, thus causing the sort order not to take effect.

SPARK-53707 partially fixes the issue on the logical plan by adding a Project of query in V1Writes.

Before SPARK-53707

Sort [0 ASC NULLS FIRST, i#280 ASC NULLS FIRST], false
+- Project [j#287 AS i#280, i#286 AS j#281, 0 AS k#282]
   +- Relation spark_catalog.default.t0[i#286,j#287,k#288] parquet

After SPARK-53707

Project [i#284, j#285, 0 AS k#290]
+- Sort [0 ASC NULLS FIRST, i#284 ASC NULLS FIRST], false
   +- Project [i#284, j#285]
      +- Relation spark_catalog.default.t0[i#284,j#285,k#286] parquet

Note, note the issue still exists because there is another place to check the ordering match again in FileFormatWriter.

This PR fixes the issue thoroughly, with new UTs added.

Does this PR introduce any user-facing change?

Yes, it's a bug fix.

How was this patch tested?

New UTs are added.

Was this patch authored or co-authored using generative AI tooling?

No.

…ble orderings

### What changes were proposed in this pull request?

This is the second try of apache#52474, following [the suggestion from cloud-fan](apache#52474 (comment))

This PR fixes a bug in `plannedWrite`, where the `query` has foldable orderings in the partition columns.

```
CREATE TABLE t (i INT, j INT, k STRING) USING PARQUET PARTITIONED BY (k);

INSERT OVERWRITE t SELECT j AS i, i AS j, '0' as k FROM t0 SORT BY k, i;
```

The evaluation of `FileFormatWriter.orderingMatched` fails because `SortOrder(Literal)` is eliminated by `EliminateSorts`.

### Why are the changes needed?

`V1Writes` will override the custom sort order when the query output ordering does not satisfy the required ordering. Before SPARK-53707, when the query's output contains literals in partition columns, the judgment produces a false-negative result, thus causing the sort order not to take effect.

SPARK-53707 partially fixes the issue on the logical plan by adding a `Project` of query in `V1Writes`.

Before SPARK-53707
```
Sort [0 ASC NULLS FIRST, i#280 ASC NULLS FIRST], false
+- Project [j#287 AS i#280, i#286 AS j#281, 0 AS k#282]
   +- Relation spark_catalog.default.t0[i#286,j#287,k#288] parquet
```

After SPARK-53707
```
Project [i#284, j#285, 0 AS k#290]
+- Sort [0 ASC NULLS FIRST, i#284 ASC NULLS FIRST], false
   +- Project [i#284, j#285]
      +- Relation spark_catalog.default.t0[i#284,j#285,k#286] parquet
```

Note, note the issue still exists because there is another place to check the ordering match again in `FileFormatWriter`.

This PR fixes the issue thoroughly, with new UTs added.

### Does this PR introduce _any_ user-facing change?

Yes, it's a bug fix.

### How was this patch tested?

New UTs are added.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#52584 from pan3793/SPARK-53738-rework.

Authored-by: Cheng Pan <[email protected]>
Signed-off-by: Peter Toth <[email protected]>
(cherry picked from commit f33d8aa)
Signed-off-by: Peter Toth <[email protected]>
@github-actions github-actions bot added the SQL label Oct 22, 2025
@pan3793 pan3793 changed the title [SPARK-53738][SQL] Fix planned write when query output contains foldable orderings [SPARK-53738][SQL][3.5] Fix planned write when query output contains foldable orderings Oct 22, 2025
Copy link
Contributor

@peter-toth peter-toth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, pending CI.

peter-toth pushed a commit that referenced this pull request Oct 22, 2025
…foldable orderings

Backport #52584 to branch-3.5

### What changes were proposed in this pull request?

This is the second try of #52474, following [the suggestion from cloud-fan](#52474 (comment))

This PR fixes a bug in `plannedWrite`, where the `query` has foldable orderings in the partition columns.

```
CREATE TABLE t (i INT, j INT, k STRING) USING PARQUET PARTITIONED BY (k);

INSERT OVERWRITE t SELECT j AS i, i AS j, '0' as k FROM t0 SORT BY k, i;
```

The evaluation of `FileFormatWriter.orderingMatched` fails because `SortOrder(Literal)` is eliminated by `EliminateSorts`.

### Why are the changes needed?

`V1Writes` will override the custom sort order when the query output ordering does not satisfy the required ordering. Before SPARK-53707, when the query's output contains literals in partition columns, the judgment produces a false-negative result, thus causing the sort order not to take effect.

SPARK-53707 partially fixes the issue on the logical plan by adding a `Project` of query in `V1Writes`.

Before SPARK-53707
```
Sort [0 ASC NULLS FIRST, i#280 ASC NULLS FIRST], false
+- Project [j#287 AS i#280, i#286 AS j#281, 0 AS k#282]
   +- Relation spark_catalog.default.t0[i#286,j#287,k#288] parquet
```

After SPARK-53707
```
Project [i#284, j#285, 0 AS k#290]
+- Sort [0 ASC NULLS FIRST, i#284 ASC NULLS FIRST], false
   +- Project [i#284, j#285]
      +- Relation spark_catalog.default.t0[i#284,j#285,k#286] parquet
```

Note, note the issue still exists because there is another place to check the ordering match again in `FileFormatWriter`.

This PR fixes the issue thoroughly, with new UTs added.

### Does this PR introduce _any_ user-facing change?

Yes, it's a bug fix.

### How was this patch tested?

New UTs are added.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #52697 from pan3793/SPARK-53694-3.5.

Authored-by: Cheng Pan <[email protected]>
Signed-off-by: Peter Toth <[email protected]>
@peter-toth peter-toth closed this Oct 22, 2025
@peter-toth
Copy link
Contributor

Thank you @pan3793 for the PR!

Merged to branch-3.5 (3.5.8).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants