-
Notifications
You must be signed in to change notification settings - Fork 3.2k
Add io.trino.spi.connector.ConnectorSplitSource#getMetrics #25770
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds a new SPI method, ConnectorSplitSource#getMetrics, to capture detailed metrics during splits generation and integrates these metrics into various scheduling and execution components. Key changes include:
- Propagation of an optional source identifier (sourceId) in OperatorStats and OperatorContext.
- Replacement of timing callbacks with a new SplitSourceMetricsRecorder across multiple scheduling/execution stages.
- Removal of SplitOperatorInfo and a shift from custom formatting of split info to using default toString() methods.
Reviewed Changes
Copilot reviewed 91 out of 91 changed files in this pull request and generated 1 comment.
Show a summary per file
File | Description |
---|---|
core/trino-main/src/main/java/io/trino/operator/ScanFilterAndProjectOperator.java | Updated operator context instantiation to include sourceId. |
core/trino-main/src/main/java/io/trino/operator/OperatorStats.java | Introduced sourceId, added getter, and implemented addConnectorSplitSourceMetrics for merging metrics. |
core/trino-main/src/main/java/io/trino/operator/OperatorInfo.java | Removed obsolete SplitOperatorInfo mapping. |
core/trino-main/src/main/java/io/trino/operator/OperatorContext.java | Updated constructor and field to include sourceId. |
core/trino-main/src/main/java/io/trino/operator/DriverContext.java | Added overloaded addOperatorContext accepting sourceId. |
core/trino-main/src/main/java/io/trino/metadata/Split.java | Removed getInfo method. |
core/trino-main/src/main/java/io/trino/execution/scheduler/faulttolerant/* | Changed split time recorder to use metricsRecorder. |
core/trino-main/src/main/java/io/trino/execution/* | Updated methods to record and propagate split source metrics. |
core/trino-main/src/main/java/io/trino/connector/* | Removed or updated split info formatting methods. |
Comments suppressed due to low confidence (1)
core/trino-main/src/main/java/io/trino/execution/SqlTaskExecution.java:897
- [nitpick] Consider verifying that the default toString() implementation of the split provides sufficient detail for debugging purposes, or implement a custom formatting method if more clarity is required.
return (partitionedSplit == null) ? "" : partitionedSplit.getSplit().toString();
ffca7c7
to
1902d67
Compare
core/trino-main/src/main/java/io/trino/sql/planner/planprinter/TextRenderer.java
Outdated
Show resolved
Hide resolved
core/trino-spi/src/main/java/io/trino/spi/connector/ConnectorSplitSource.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergSplitManager.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/execution/QueryStateMachine.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/execution/QueryStateMachine.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/operator/OperatorStats.java
Outdated
Show resolved
Hide resolved
Overall LGTM |
Allows connectors to expose detailed metrics from splits generation. These metrics are exposed in StageStats which is part of queryinfo. EXPLAIN ANALYZE VERBOSE is modified to print these metrics.
Example output ScanFilterProject[table = iceberg:default.lineitem$data@5213842060806047048, filterPredicate = (l_orderkey = bigint '3423110')] connector metrics: 'ParquetReaderCompressionFormat_ZSTD' = LongCount{total=9808205} splits generation metrics: 'dataFileSizeBytes' = LongCount{total=394528676} 'dataFiles' = LongCount{total=8} 'dataManifests' = LongCount{total=1} 'deleteFileSizeBytes' = LongCount{total=2370} 'deleteManifests' = LongCount{total=1} 'equalityDeleteFiles' = LongCount{total=0} 'positionalDeleteFiles' = LongCount{total=2} 'scanPlanningDuration' = {duration=7.00ms}
This is not needed anymore after adding split source metrics Removing this also avoids unnecessary fetching of splits info from worker to coordinator
This is no longer needed as toString suffices to print debug info on workers
1902d67
to
732cc7b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds a new SPI method, ConnectorSplitSource#getMetrics, to allow connectors to expose detailed metrics from splits generation and update various components to record and display these metrics. Key changes include:
- Injecting an Optional sourceId into operator contexts and stats to support metric recording.
- Replacing the existing split time recording with a metrics‐based approach throughout scheduling and execution.
- Removing SplitOperatorInfo and updating split info formatting.
Reviewed Changes
Copilot reviewed 91 out of 91 changed files in this pull request and generated 2 comments.
Show a summary per file
File | Description |
---|---|
ScanFilterAndProjectOperator.java | Updated operator context creation to pass Optional sourceId. |
OperatorStats.java | Added sourceId field and withConnectorSplitSourceMetrics method to merge connector split metrics. |
OperatorInfo.java | Removed deprecated SplitOperatorInfo registration. |
OperatorContext.java | Injected Optional sourceId into constructors. |
DriverContext.java | Added overload to support Optional sourceId when adding an operator context. |
Split.java | Removed getInfo() method to deprecate split info representation. |
SplitSourceMetricsRecorder.java | Introduced new interface for recording metrics. |
EventDrivenTaskSourceFactory.java, EventDrivenTaskSource.java | Replaced BiConsumer time recorder with a metrics recorder, updating corresponding references. |
StageExecution.java, SourcePartitionedScheduler.java, PipelinedStageExecution.java, StageStats.java, StageStateMachine.java, SqlTaskExecution.java, SqlStage.java, QueryStateMachine.java | Updated methods to record and propagate split source metrics. |
SystemSplit.java, InformationSchemaSplit.java | Changed the split info formatting to rely on toString(). |
Comments suppressed due to low confidence (1)
core/trino-main/src/main/java/io/trino/execution/scheduler/StageStateMachine.java:753
- The current implementation overwrites split source metrics for a given node; if multiple updates are expected, consider merging the metrics instead of replacing them.
splitSourceMetrics.put(nodeId, metrics);
Description
Add new SPI
io.trino.spi.connector.ConnectorSplitSource#getMetrics
This allows connectors to expose detailed metrics from splits generation.
EXPLAIN ANALYZE VERBOSE is modified to print these metrics.
Added an implementation in iceberg to collect metrics from iceberg metadata scan.
Example output
Remove SplitOperatorInfo and ConnectorSplit#getSplitInfo, this
avoids the need to send splits info from workers to coordinator.
Additional context and related issues
Release notes
( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text: