feat: pass the ordering information to native Scan #2375

rluvaton · 2025-09-10T20:29:34Z

Which issue does this PR close?

N/A

Rationale for this change

Sort information can be used in specialized implementation (for example sort will not sort if the input is already sorted, hash aggregate will use GroupValues that are tracking new groups once they saw the next value)

What changes are included in this PR?

Used the child output ordering

How are these changes tested?

Existing tests

This can cause problems if spark says something is sorted while we don't sort it.

for example shuffle files in spark are sorted, but ours are not, so we should make sure that the sort is used correctly.

This can cause problems if spark says something is sorted while we don't sort it. for example shuffle files in spark are sorted, but ours are not, so we should make sure that the sort is used correctly.

rluvaton · 2025-09-10T20:51:05Z

@andygrove can you please start the CI?

codecov-commenter · 2025-09-10T21:32:13Z

Codecov Report

❌ Patch coverage is 87.87879% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 57.47%. Comparing base (f09f8af) to head (dc14410).
⚠️ Report is 497 commits behind head on main.

Files with missing lines	Patch %	Lines
.../scala/org/apache/comet/serde/QueryPlanSerde.scala	82.60%	1 Missing and 3 partials ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##               main    #2375      +/-   ##
============================================
+ Coverage     56.12%   57.47%   +1.35%     
- Complexity      976     1297     +321     
============================================
  Files           119      147      +28     
  Lines         11743    13438    +1695     
  Branches       2251     2353     +102     
============================================
+ Hits           6591     7724    +1133     
- Misses         4012     4452     +440     
- Partials       1140     1262     +122

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

native/core/src/execution/operators/scan.rs

Co-authored-by: Oleks V <[email protected]>

comphead

Starting the CI, the first pass looks good to me, I'll check again later today.

comphead · 2025-09-15T15:40:28Z

spark/src/main/scala/org/apache/spark/sql/comet/CometExecUtils.scala

@@ -48,13 +48,14 @@ object CometExecUtils {
   * partition. The limit operation is performed on the native side.
   */
  def getNativeLimitRDD(
+      child: SparkPlan,


this is confusing IMO. child is the plan and childPlan in fact is data

will rename

comphead · 2025-09-15T15:49:25Z

@rluvaton

for example shuffle files in spark are sorted, but ours are not, so we should make sure that the sort is used correctly.
IMO files are sorted by partitionId within partition the order is not guaranteed unless the optional user key is requested

This is nice to have sort info in sync from Spark caller to the native code, to check benefits would be great to see if any of TPCH queries got faster

…st-scan' into pass-the-sorted-input-data-to-rust-scan

feat: pass the sorted input data to rust scan

4d0c2ab

This can cause problems if spark says something is sorted while we don't sort it. for example shuffle files in spark are sorted, but ours are not, so we should make sure that the sort is used correctly.

comphead reviewed Sep 11, 2025

View reviewed changes

native/core/src/execution/operators/scan.rs Outdated Show resolved Hide resolved

Update scan.rs

dc14410

Co-authored-by: Oleks V <[email protected]>

comphead reviewed Sep 12, 2025

View reviewed changes

comphead reviewed Sep 15, 2025

View reviewed changes

rluvaton added 3 commits September 16, 2025 16:00

rename

502566a

Merge remote-tracking branch 'origin/pass-the-sorted-input-data-to-ru…

303db06

…st-scan' into pass-the-sorted-input-data-to-rust-scan

format

271fc47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: pass the ordering information to native Scan #2375

feat: pass the ordering information to native Scan #2375

Uh oh!

rluvaton commented Sep 10, 2025

Uh oh!

rluvaton commented Sep 10, 2025

Uh oh!

codecov-commenter commented Sep 10, 2025 •

edited

Loading

Uh oh!

Uh oh!

comphead left a comment

Uh oh!

comphead Sep 15, 2025

Uh oh!

rluvaton Sep 16, 2025

Uh oh!

comphead commented Sep 15, 2025

Uh oh!

Uh oh!

feat: pass the ordering information to native Scan #2375

Are you sure you want to change the base?

feat: pass the ordering information to native Scan #2375

Uh oh!

Conversation

rluvaton commented Sep 10, 2025

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

How are these changes tested?

Uh oh!

rluvaton commented Sep 10, 2025

Uh oh!

codecov-commenter commented Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

comphead left a comment

Choose a reason for hiding this comment

Uh oh!

comphead Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

rluvaton Sep 16, 2025

Choose a reason for hiding this comment

Uh oh!

comphead commented Sep 15, 2025

Uh oh!

Uh oh!

codecov-commenter commented Sep 10, 2025 •

edited

Loading