Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use sort ordering on timestamp array #443

Merged
merged 2 commits into from
Jun 25, 2023
Merged

Conversation

trueleo
Copy link
Contributor

@trueleo trueleo commented Jun 24, 2023

Fixes #430.

Description

Write timestamp sortedness metadata to parquet and provide external sort information to datafusion. This way the SortExec can be avoided in execution plan with most queries which use order by p_timestamp.

Example

explain select p_timestamp from {{stream_name}} order by p_timestamp asc

In physical plan it is visible that SortExec is eliminated as output_ordering is pushed to ParquetExec node

"plan": "SortPreservingMergeExec: [p_timestamp@0 ASC NULLS LAST]
  ParquetExec: file_groups={4 groups: [.....]}, projection=[p_timestamp], output_ordering=[p_timestamp@0 ASC NULLS LAST]",

Note:

This is still not the most optimized version of this query as SortPreservingExec is not really needed here. The issue here is that the Datafusion is not aware that the partitions / files are non overlapping when considering timestamp

Also if the target partition limit is crossed then datafusion again adds SortExec to physical plan.


This PR has:

  • been tested to ensure log ingestion and log query works.
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added documentation for new or modified features or behaviors.

infinite_source: false,
format: Arc::new(file_format),
table_partition_cols: vec![],
collect_stat: true,
target_partitions: 1,
target_partitions: 32,
Copy link
Member

@nitisht nitisht Jun 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What the significance of changing this field target_partitions here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Roughly the partition here is number of parallel streams that is generated by datafusion during execution. Having this 1 was causing all files to be grouped in one partition and datafusion is unable to use external sort information for files in a group as it cannot infer order between grouped files and if they are overlapping in time range or not.

Copy link
Contributor Author

@trueleo trueleo Jun 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So at max datafusion will hold 32 streams to calculate the output and merge them back using SortPreservingMerge

infinite_source: false,
format: Arc::new(file_format),
table_partition_cols: vec![],
collect_stat: true,
target_partitions: 1,
target_partitions: 32,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section is repeated for local and s3 mode. Can we move it to the common abstraction?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this needs refactoring

@nitisht
Copy link
Member

nitisht commented Jun 25, 2023

Fixes #418.

Shouldn't this be #430? #418 is related to staging query

@nitisht nitisht merged commit 3e5548d into parseablehq:main Jun 25, 2023
6 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Jun 25, 2023
@trueleo trueleo deleted the sort_event branch July 4, 2023 05:40
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Slow performace when using ORDER BY
2 participants