Skip to content

Conversation

rluvaton
Copy link
Contributor

Which issue does this PR close?

N/A

Rationale for this change

Sort information can be used in specialized implementation (for example sort will not sort if the input is already sorted, hash aggregate will use GroupValues that are tracking new groups once they saw the next value)

What changes are included in this PR?

Used the child output ordering

How are these changes tested?

Existing tests


This can cause problems if spark says something is sorted while we don't sort it.

for example shuffle files in spark are sorted, but ours are not, so we should make sure that the sort is used correctly.

This can cause problems if spark says something is sorted while we don't sort it.

for example shuffle files in spark are sorted, but ours are not, so we should make sure that the sort is used correctly.
@rluvaton
Copy link
Contributor Author

@andygrove can you please start the CI?

@codecov-commenter
Copy link

codecov-commenter commented Sep 10, 2025

Codecov Report

❌ Patch coverage is 87.87879% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 57.47%. Comparing base (f09f8af) to head (dc14410).
⚠️ Report is 497 commits behind head on main.

Files with missing lines Patch % Lines
.../scala/org/apache/comet/serde/QueryPlanSerde.scala 82.60% 1 Missing and 3 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #2375      +/-   ##
============================================
+ Coverage     56.12%   57.47%   +1.35%     
- Complexity      976     1297     +321     
============================================
  Files           119      147      +28     
  Lines         11743    13438    +1695     
  Branches       2251     2353     +102     
============================================
+ Hits           6591     7724    +1133     
- Misses         4012     4452     +440     
- Partials       1140     1262     +122     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Co-authored-by: Oleks V <[email protected]>
Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Starting the CI, the first pass looks good to me, I'll check again later today.

@@ -48,13 +48,14 @@ object CometExecUtils {
* partition. The limit operation is performed on the native side.
*/
def getNativeLimitRDD(
child: SparkPlan,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is confusing IMO. child is the plan and childPlan in fact is data

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will rename

@comphead
Copy link
Contributor

@rluvaton

for example shuffle files in spark are sorted, but ours are not, so we should make sure that the sort is used correctly.
IMO files are sorted by partitionId within partition the order is not guaranteed unless the optional user key is requested

This is nice to have sort info in sync from Spark caller to the native code, to check benefits would be great to see if any of TPCH queries got faster

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants