Fix SELECT DISTINCT with delimited alias in ORDER BY bug #27004

codluca · 2025-10-17T13:10:17Z

Fix SELECT DISTINCT with delimited alias in ORDER BY bug

Description

The canonical value of a non-delimited Identifier was the upper-case value. The canonical value of a delimited Identifier was simply the value. Thus, statements like
SELECT DISTINCT a as x FROM (VALUES 2, 1, 2) t(a) ORDER BY "x"
would fail, as the canonical value of the identifier from SELECT DISTINCT would be "X",
while the canonical value of the identifier from ORDER BY would be "x", and the identifiers would not match.

Additional context and related issues

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
(X) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:

## Section
* Fix some things. ({issue}`26755`)

Summary by Sourcery

Fix SELECT DISTINCT alias matching in ORDER BY to handle delimited and mixed-case identifiers by adding ignore-case canonicalization support and updating alias comparison accordingly.

Bug Fixes:

Fix failure when ordering by a delimited SELECT DISTINCT alias due to mismatched canonical values.

Enhancements:

Add ignore-case mode in CanonicalizationAware for identifier equality and hashing.
Update Identifier to support case-insensitive canonical value retrieval.
Adjust StatementAnalyzer to use ignore-case canonicalization when resolving DISTINCT aliases.

Tests:

Add test cases for SELECT DISTINCT with various combinations of delimited and mixed-case aliases in ORDER BY.

The canonical value of a non-delimited Identifier was the upper-case value. The canonical value of a delimited Identifier was simply the value. Thus, statements like SELECT DISTINCT a as x FROM (VALUES 2, 1, 2) t(a) ORDER BY "x" would fail, as the canonical value of the identifier from SELECT DISTINCT would be "X", while the canonical value of the identifier from ORDER BY would be "x", and the identifiers would not match.

sourcery-ai · 2025-10-17T13:10:23Z

Reviewer's Guide

Implement case-insensitive handling of delimited aliases in SELECT DISTINCT with ORDER BY by extending identifier canonicalization, updating analyzer lookup logic, and adding comprehensive tests.

Sequence diagram for case-insensitive alias matching in SELECT DISTINCT with ORDER BY

sequenceDiagram
    participant "StatementAnalyzer"
    participant "CanonicalizationAware"
    participant "Identifier"
    "StatementAnalyzer"->>"CanonicalizationAware": canonicalizationAwareKey(expression, true)
    "CanonicalizationAware"->>"Identifier": getCanonicalValue(true)
    "Identifier"-->>"CanonicalizationAware": canonical value (case-insensitive)
    "CanonicalizationAware"-->>"StatementAnalyzer": key for alias lookup (case-insensitive)
    "StatementAnalyzer"->>"CanonicalizationAware": aliases.contains(key)
    "CanonicalizationAware"-->>"StatementAnalyzer": result (match found or not)

Class diagram for updated Identifier and CanonicalizationAware classes

classDiagram
    class Identifier {
        -String value
        -boolean delimited
        +boolean isDelimited()
        +String getCanonicalValue()
        +String getCanonicalValue(boolean ignoreCase)
    }
    class CanonicalizationAware {
        -T node
        -boolean ignoreCase
        -int hashCode
        +CanonicalizationAware(T node, boolean ignoreCase)
        +static canonicalizationAwareKey(T node)
        +static canonicalizationAwareKey(T node, boolean ignoreCase)
        +T getNode()
        +int hashCode()
        +boolean equals(Object o)
        +static Boolean canonicalizationAwareComparison(Node left, Node right)
        +static Boolean canonicalizationAwareIgnoreCaseComparison(Node left, Node right)
        +static OptionalInt canonicalizationAwareHash(Node node)
        +static OptionalInt canonicalizationAwareIgnoreCaseHash(Node node)
    }
    CanonicalizationAware <|-- Identifier

File-Level Changes

Change	Details	Files
Added exhaustive tests for alias case and delimiter combinations in DISTINCT … ORDER BY	Added tests for lower-case, upper-case, and mixed-case delimited and non-delimited aliases Covered all combinations of SELECT DISTINCT alias quoting and ORDER BY references	`core/trino-main/src/test/java/io/trino/sql/query/TestDistinctWithOrderBy.java`
Extended CanonicalizationAware to support case-insensitive comparisons	Introduced ignoreCase flag, constructor overload, and factory method Switched hashCode and equals to choose case-sensitive or case-insensitive logic Implemented canonicalizationAwareIgnoreCaseComparison and canonicalizationAwareIgnoreCaseHash	`core/trino-main/src/main/java/io/trino/sql/analyzer/CanonicalizationAware.java`
Updated StatementAnalyzer to bind DISTINCT aliases ignoring case	Replaced canonicalizationAwareKey calls with ignoreCase=true for ORDER BY alias checks Ensured alias collection uses case-insensitive canonicalization	`core/trino-main/src/main/java/io/trino/sql/analyzer/StatementAnalyzer.java`
Refactored Identifier to expose case-insensitive canonicalization	Added getCanonicalValue(boolean ignoreCase) overload Delegated existing getCanonicalValue() to new method Adjusted logic to respect quoting and case when ignoreCase is false	`core/trino-parser/src/main/java/io/trino/sql/tree/Identifier.java`

Possibly linked issues

SELECT DISTINCT with Delimited Alias in ORDER BY Errors (Case Sensitivity) #26755: The PR fixes the case-insensitive matching of delimited aliases in ORDER BY clauses for SELECT DISTINCT queries by modifying identifier canonicalization.

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey there - I've reviewed your changes and they look great!

Prompt for AI Agents

Please address the comments from this code review:

## Individual Comments

### Comment 1
<location> `core/trino-parser/src/main/java/io/trino/sql/tree/Identifier.java:84` </location>
<code_context>
+        return getCanonicalValue(false);
+    }
+
+    public String getCanonicalValue(boolean ignoreCase)
+    {
+        if (!ignoreCase && isDelimited()) {
</code_context>

<issue_to_address>
**issue (bug_risk):** The logic for case handling in getCanonicalValue(boolean ignoreCase) may be inconsistent for delimited identifiers.

Delimited identifiers are usually case-sensitive, so converting them to upper case when ignoreCase is true may be incorrect. Please review this logic.
</issue_to_address>

### Comment 2
<location> `core/trino-main/src/main/java/io/trino/sql/analyzer/CanonicalizationAware.java:128` </location>
<code_context>
+        if (node instanceof Identifier identifier) {
+            return OptionalInt.of(identifier.getCanonicalValue(true).hashCode());
+        }
+        if (node.getChildren().isEmpty()) {
+            return OptionalInt.of(node.hashCode());
+        }
</code_context>

<issue_to_address>
**issue (bug_risk):** Including node.hashCode() for leaf nodes in canonicalizationAwareIgnoreCaseHash may introduce inconsistency.

This mismatch between hashCode and equals for non-Identifier leaf nodes may cause issues in hash-based collections. Please ensure both methods are consistent.
</issue_to_address>

### Comment 3
<location> `core/trino-main/src/main/java/io/trino/sql/analyzer/StatementAnalyzer.java:5767` </location>
<code_context>
                 // the "a" in the SELECT clause is bound to the FROM scope, while the "a" in ORDER BY clause is bound
                 // to the "a" from the SELECT clause, so we can't compare by field id / relation id.
-                if (expression instanceof Identifier && aliases.contains(canonicalizationAwareKey(expression))) {
+                if (expression instanceof Identifier && aliases.contains(canonicalizationAwareKey(expression, true))) {
                     continue;
                 }
</code_context>

<issue_to_address>
**question (bug_risk):** The use of ignoreCase=true for all alias comparisons may affect queries with delimited identifiers.

This change may result in incorrect behavior for queries using case-sensitive delimited identifiers. Please review whether this could impact query accuracy.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2025-10-17T13:11:12Z

core/trino-main/src/main/java/io/trino/sql/analyzer/CanonicalizationAware.java

+        if (node instanceof Identifier identifier) {
+            return OptionalInt.of(identifier.getCanonicalValue(true).hashCode());
+        }
+        if (node.getChildren().isEmpty()) {


issue (bug_risk): Including node.hashCode() for leaf nodes in canonicalizationAwareIgnoreCaseHash may introduce inconsistency.

This mismatch between hashCode and equals for non-Identifier leaf nodes may cause issues in hash-based collections. Please ensure both methods are consistent.

sourcery-ai · 2025-10-17T13:11:12Z

core/trino-main/src/main/java/io/trino/sql/analyzer/StatementAnalyzer.java

                // the "a" in the SELECT clause is bound to the FROM scope, while the "a" in ORDER BY clause is bound
                // to the "a" from the SELECT clause, so we can't compare by field id / relation id.
-                if (expression instanceof Identifier && aliases.contains(canonicalizationAwareKey(expression))) {
+                if (expression instanceof Identifier && aliases.contains(canonicalizationAwareKey(expression, true))) {


question (bug_risk): The use of ignoreCase=true for all alias comparisons may affect queries with delimited identifiers.

This change may result in incorrect behavior for queries using case-sensitive delimited identifiers. Please review whether this could impact query accuracy.

Praveen2112 · 2025-10-18T02:27:08Z

core/trino-main/src/main/java/io/trino/sql/analyzer/CanonicalizationAware.java

        return null;
    }

+    public static Boolean canonicalizationAwareIgnoreCaseComparison(Node left, Node right)


If we are ignoring case comparison - Say if we have a ORDER BY x, "X" - How would it match ?

martint · 2025-10-18T05:34:03Z

core/trino-main/src/main/java/io/trino/sql/analyzer/CanonicalizationAware.java

    private int hashCode;

-    private CanonicalizationAware(T node)
+    private CanonicalizationAware(T node, boolean ignoreCase)


This isn't right. The whole purpose of this class is to compare identifiers taking into account SQL canonicalization rules. I.e., delimited identifiers are kept as is, while non-delimited identifiers are canonicalized (to upper-case, per the SQL standard) before comparison.

Indeed, if the adopted convention is similar to Oracle (non-delimited identifiers converted to uppercase, delimited identifiers kept as they are), then the changes in this PR are not ok.

The changes in this PR try to follow the Trino documentation https://trino.io/docs/current/language/reserved.html#language-identifiers
"Identifiers are not treated as case sensitive."
I read the "Identifiers" as meaning all identifiers, non-delimited and delimited.

The SELECT (without DISTINCT) follows the convention in the online documentation, by ignoring the case of all identifiers.
The changes here try to bring the SELECT DISTINCT to be similar to SELECT.

I modified the CanonicalizationAware, instead of removing its usage from SELECT DISTINCT, because the Identifier's equals and hashCode consider the case sensitive value.

I will copy this comment on the issue #26755, as the issue is more visible, and this PR seems to be wrong.

cla-bot bot added the cla-signed label Oct 17, 2025

sourcery-ai bot reviewed Oct 17, 2025

View reviewed changes

raunaqmorarka requested review from kasiafi and martint October 17, 2025 16:16

Praveen2112 reviewed Oct 18, 2025

View reviewed changes

martint requested changes Oct 18, 2025

View reviewed changes

codluca mentioned this pull request Oct 18, 2025

SELECT DISTINCT with Delimited Alias in ORDER BY Errors (Case Sensitivity) #26755

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix SELECT DISTINCT with delimited alias in ORDER BY bug #27004

Fix SELECT DISTINCT with delimited alias in ORDER BY bug #27004

Uh oh!

codluca commented Oct 17, 2025 •

edited by sourcery-ai bot

Loading

Uh oh!

sourcery-ai bot commented Oct 17, 2025 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Uh oh!

sourcery-ai bot Oct 17, 2025

Uh oh!

sourcery-ai bot Oct 17, 2025

Uh oh!

Praveen2112 Oct 18, 2025

Uh oh!

martint Oct 18, 2025

Uh oh!

codluca Oct 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

Fix SELECT DISTINCT with delimited alias in ORDER BY bug #27004

Are you sure you want to change the base?

Fix SELECT DISTINCT with delimited alias in ORDER BY bug #27004

Uh oh!

Conversation

codluca commented Oct 17, 2025 • edited by sourcery-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Additional context and related issues

Release notes

Summary by Sourcery

Uh oh!

sourcery-ai bot commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Sequence diagram for case-insensitive alias matching in SELECT DISTINCT with ORDER BY

Class diagram for updated Identifier and CanonicalizationAware classes

File-Level Changes

Possibly linked issues

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

Praveen2112 Oct 18, 2025

Choose a reason for hiding this comment

Uh oh!

martint Oct 18, 2025

Choose a reason for hiding this comment

Uh oh!

codluca Oct 18, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

codluca commented Oct 17, 2025 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented Oct 17, 2025 •

edited

Loading