fix: Info and infoUnion should always be synced #26707

shangm2 · 2025-11-26T22:02:15Z

Description

we saw a case when thrift is enabled, deserializer will only populate infounion but not info. And when the operatestats needs to be serialized to json for event logging, info will be empty while infounion, even though not empty, is not part of json serde. So we should keep info and infounion always synced in case different serde is being used in the system.
S594718

Motivation and Context

we need info and infounion to be synced

Impact

Test Plan

passed verifier run

Contributor checklist

Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
Documented new properties (with its default value), SQL syntax, functions, or other functionality.
If release notes are required, they follow the release notes guidelines.
Adequate tests were added if applicable.
CI passed.
If adding new dependencies, verified they have an OpenSSF Scorecard score of 5.0 or higher (or obtained explicit TSC approval for lower scores).

Release Notes

== NO RELEASE NOTE ==

sourcery-ai · 2025-11-26T22:02:22Z

Reviewer's Guide

Adds histogram-based percentile tracking (p90/p95/p99) to RuntimeMetric with JSON-serializable configuration, updates RuntimeStats and StageExecutionStateMachine to optionally track percentiles for selected metrics, extends QueryMonitor to read TableFinishInfo from both JSON and Thrift unions, and introduces comprehensive tests for percentile behavior and merging semantics.

Sequence diagram for recording metrics with percentile tracking and computing percentiles

sequenceDiagram
    actor Coordinator
    participant StageExecutionStateMachine as StageExecutionStateMachine
    participant RuntimeStats as RuntimeStats
    participant RuntimeMetric as RuntimeMetric

    Coordinator->>StageExecutionStateMachine: recordStartWaitForEventLoop(nanos)
    StageExecutionStateMachine->>RuntimeStats: addMetricValue(TASK_START_WAIT_FOR_EVENT_LOOP, NANO, max(nanos,0), true)
    alt metric_exists
        RuntimeStats->>RuntimeMetric: addValue(value)
        RuntimeMetric->>RuntimeMetric: update sum/count/min/max
        RuntimeMetric->>RuntimeMetric: getBucketIndex(value)
        RuntimeMetric->>RuntimeMetric: histogramBuckets.incrementAndGet(bucketIndex)
    else metric_missing
        RuntimeStats->>RuntimeStats: create new RuntimeMetric(name, unit, trackPercentiles=true)
        RuntimeStats->>RuntimeMetric: addValue(value)
        RuntimeMetric->>RuntimeMetric: update sum/count/min/max
        RuntimeMetric->>RuntimeMetric: getBucketIndex(value)
        RuntimeMetric->>RuntimeMetric: histogramBuckets.incrementAndGet(bucketIndex)
    end

    Coordinator->>RuntimeStats: computeAllPercentiles()
    loop for_each_metric
        RuntimeStats->>RuntimeMetric: isPercentileTrackingEnabled()
        alt tracking_enabled
            RuntimeStats->>RuntimeMetric: getP90()
            RuntimeMetric->>RuntimeMetric: computePercentile(0.90)
            RuntimeMetric-->>RuntimeStats: p90
            RuntimeStats->>RuntimeMetric: getP95()
            RuntimeMetric->>RuntimeMetric: computePercentile(0.95)
            RuntimeMetric-->>RuntimeStats: p95
            RuntimeStats->>RuntimeMetric: getP99()
            RuntimeMetric->>RuntimeMetric: computePercentile(0.99)
            RuntimeMetric-->>RuntimeStats: p99
        else tracking_disabled
            RuntimeStats-->>RuntimeStats: skip_metric
        end
    end

Sequence diagram for QueryMonitor resolving TableFinishInfo from JSON and Thrift union

sequenceDiagram
    participant QueryMonitor as QueryMonitor
    participant QueryInfo as QueryInfo
    participant QueryStats as QueryStats
    participant OperatorStats as OperatorStats
    participant OperatorInfoUnion as OperatorInfoUnion
    participant TableFinishInfo as TableFinishInfo

    QueryMonitor->>QueryInfo: getOutput()
    alt output_present
        QueryMonitor->>QueryInfo: getQueryStats()
        QueryInfo-->>QueryMonitor: QueryStats
        QueryMonitor->>QueryStats: getOperatorSummaries()
        QueryStats-->>QueryMonitor: List~OperatorStats~
        loop operator_summaries
            QueryMonitor->>OperatorStats: getInfo()
            alt info_is_TableFinishInfo
                OperatorStats-->>QueryMonitor: TableFinishInfo
                QueryMonitor-->>TableFinishInfo: use_for_QueryOutputMetadata
            else info_not_TableFinishInfo
                OperatorStats-->>QueryMonitor: OperatorInfo
                QueryMonitor->>OperatorStats: getInfoUnion()
                alt infoUnion_not_null
                    OperatorStats-->>QueryMonitor: OperatorInfoUnion
                    QueryMonitor->>OperatorInfoUnion: getTableFinishInfo()
                    alt union_has_TableFinishInfo
                        OperatorInfoUnion-->>QueryMonitor: TableFinishInfo
                        QueryMonitor-->>TableFinishInfo: use_for_QueryOutputMetadata
                    else union_missing_TableFinishInfo
                        OperatorInfoUnion-->>QueryMonitor: null
                    end
                else infoUnion_null
                    OperatorStats-->>QueryMonitor: null
                end
            end
        end
    else output_absent
        QueryInfo-->>QueryMonitor: Optional.empty
        QueryMonitor-->>QueryMonitor: no_QueryOutputMetadata
    end

Class diagram for updated RuntimeMetric and RuntimeStats percentile tracking

classDiagram
    class RuntimeMetric {
        - static int DEFAULT_NUM_BUCKETS
        - int numBuckets
        - long bucketWidth
        - String name
        - RuntimeUnit unit
        - AtomicLong sum
        - AtomicLong count
        - AtomicLong max
        - AtomicLong min
        - boolean percentileTrackingEnabled
        - AtomicLongArray histogramBuckets
        - Long p90
        - Long p95
        - Long p99
        + RuntimeMetric(name String, unit RuntimeUnit)
        + RuntimeMetric(name String, unit RuntimeUnit, trackPercentiles boolean)
        + RuntimeMetric(name String, unit RuntimeUnit, trackPercentiles boolean, bucketWidth long)
        + RuntimeMetric(name String, unit RuntimeUnit, trackPercentiles boolean, bucketWidth long, numBuckets int)
        + RuntimeMetric(name String, unit RuntimeUnit, sum long, count long, max long, min long)
        + RuntimeMetric(name String, unit RuntimeUnit, sum long, count long, max long, min long, numBuckets Integer, bucketWidth Long, p90 Long, p95 Long, p99 Long)
        + static RuntimeMetric copyOf(metric RuntimeMetric) RuntimeMetric
        - static long determineBucketWidth(unit RuntimeUnit) long
        - void set(sum long, count long, max long, min long)
        + void set(metric RuntimeMetric)
        + boolean isPercentileTrackingEnabled()
        + String getName()
        + void addValue(value long)
        - int getBucketIndex(value long) int
        + void mergeWith(metric RuntimeMetric)
        + long getSum()
        + long getCount()
        + long getMax()
        + long getMin()
        + RuntimeUnit getUnit()
        + Integer getNumBuckets()
        + Long getBucketWidth()
        + Long getP90()
        + Long getP95()
        + Long getP99()
        - long computePercentile(percentile double) long
        - static void checkState(condition boolean, message String)
        + String toString()
    }

    class RuntimeStats {
        - Map~String, RuntimeMetric~ metrics
        + RuntimeStats()
        + Map~String, RuntimeMetric~ getMetrics()
        + void addMetricValue(name String, unit RuntimeUnit, value long)
        + void addMetricValue(name String, unit RuntimeUnit, value long, trackPercentiles boolean)
        + void addMetricValueIgnoreZero(name String, unit RuntimeUnit, value long)
        + void addMetricValueIgnoreZero(name String, unit RuntimeUnit, value long, trackPercentiles boolean)
        + void addMetric(name String, metric RuntimeMetric)
        + void mergeWith(other RuntimeStats)
        + void recordWallTime(tag String, runnable Runnable)
        + void recordCpuTime(tag String, supplier Supplier~Object~)
        + void recordWallAndCpuTime(tag String, supplier Supplier~Object~)
        + void computeAllPercentiles()
    }

    class StageExecutionStateMachine {
        - RuntimeStats runtimeStats
        + void recordStartWaitForEventLoop(nanos long)
        + void recordTaskUpdateDeliveredTime(nanos long)
        + void recordDeliveredUpdates(updates int)
    }

    RuntimeStats --> RuntimeMetric : uses
    StageExecutionStateMachine --> RuntimeStats : uses
    StageExecutionStateMachine ..> RuntimeMetric : records_metrics_with_percentiles

File-Level Changes

Change	Details	Files
Introduce configurable histogram-based percentile tracking (p90/p95/p99) into RuntimeMetric with JSON compatibility and safe merging/copy semantics.	Add AtomicLongArray-backed histogram, bucket configuration (numBuckets, bucketWidth), and percentile-tracking flag to RuntimeMetric. Provide multiple constructors to enable/disable percentile tracking and to configure bucket width and bucket count, including auto-configuration based on RuntimeUnit. Implement addValue, mergeWith, set, copyOf, and toString logic that respects histogram configuration, enforces unit/bucket compatibility, and caches percentiles. Add JSON-creator constructor and @JsonProperty accessors for numBuckets, bucketWidth, p90, p95, and p99 so snapshots can be serialized/deserialized without histograms. Implement computePercentile and helper methods (getBucketIndex, determineBucketWidth, checkState) with clamping and overflow handling for approximate percentiles.	`presto-common/src/main/java/com/facebook/presto/common/RuntimeMetric.java`
Extend RuntimeStats API to support percentile-enabled metrics and to precompute all percentiles for serialization/consumers.	Add overloaded addMetricValue that takes a trackPercentiles flag and constructs RuntimeMetric accordingly. Add computeAllPercentiles helper that forces computation and caching of p90/p95/p99 on all percentile-enabled metrics.	`presto-common/src/main/java/com/facebook/presto/common/RuntimeStats.java`
Enable percentile tracking for task start wait-for-event-loop metric at stage level.	Change StageExecutionStateMachine.recordStartWaitForEventLoop to call the new RuntimeStats.addMetricValue overload with trackPercentiles set to true for TASK_START_WAIT_FOR_EVENT_LOOP.	`presto-main-base/src/main/java/com/facebook/presto/execution/StageExecutionStateMachine.java`
Make QueryMonitor resilient to TableFinishInfo coming either from JSON-serialized OperatorInfo or Thrift OperatorInfoUnion, so written partition logging works for thrift serde.	Update selection of TableFinishInfo to first check OperatorStats.info (JSON) and then OperatorStats.infoUnion.getTableFinishInfo() when available, filtering out nulls. Adjust getQueryIOMetadata output construction accordingly so it works for both serialization paths.	`presto-main-base/src/main/java/com/facebook/presto/event/QueryMonitor.java`
Add extensive unit tests covering percentile tracking behavior, configuration, JSON serialization, merging, and edge cases for RuntimeMetric.	Add tests to verify percentiles are disabled by default, enabled via constructor, and correctly computed for typical distributions and units (NANO, BYTE, NONE). Add tests for JSON serialization/deserialization of percentile snapshots, copyOf behavior, and merging semantics including mismatched units, bucket widths, and bucket counts. Add tests for various edge cases: negative values, overflow buckets, skewed distributions, small sample sizes, single values, all values in same bucket, zero values, and large counts/values. Add tests that validate interaction between percentile tracking flags (both disabled, one enabled) and merge behavior, including cached percentile stability after copy.	`presto-common/src/test/java/com/facebook/presto/common/TestRuntimeMetric.java`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey there - I've reviewed your changes - here's some feedback:

In RuntimeMetric.set(RuntimeMetric) you now enforce same bucketWidth/numBuckets even when percentile tracking is effectively unused, which may be unnecessarily strict and could break existing call sites that only care about the basic aggregates; consider gating those checks on both metrics having percentile tracking enabled.
The new RuntimeStats.addMetricValue(name, unit, value, trackPercentiles) ignores trackPercentiles after the first insertion because of computeIfAbsent; if a caller ever expects to turn percentile tracking on for an existing metric, that will silently fail, so it might be worth either documenting or enforcing that the flag is consistent per name.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- In `RuntimeMetric.set(RuntimeMetric)` you now enforce same bucketWidth/numBuckets even when percentile tracking is effectively unused, which may be unnecessarily strict and could break existing call sites that only care about the basic aggregates; consider gating those checks on both metrics having percentile tracking enabled.
- The new `RuntimeStats.addMetricValue(name, unit, value, trackPercentiles)` ignores `trackPercentiles` after the first insertion because of `computeIfAbsent`; if a caller ever expects to turn percentile tracking on for an existing metric, that will silently fail, so it might be worth either documenting or enforcing that the flag is consistent per name.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

presto-main-base/src/main/java/com/facebook/presto/operator/OperatorInfoUnion.java

shangm2 requested review from a team and elharo as code owners November 26, 2025 22:02

prestodb-ci added the from:Meta PR from Meta label Nov 26, 2025

shangm2 force-pushed the thrift_4_batch branch from 0f42f69 to 9fceba9 Compare November 26, 2025 22:02

sourcery-ai bot reviewed Nov 26, 2025

View reviewed changes

shangm2 force-pushed the thrift_4_batch branch from 14e2938 to 27014b8 Compare November 26, 2025 23:34

shangm2 changed the title ~~fix: TableFinishInfo is part of union in thrift~~ fix: info and infoUnion should always be synced. Nov 26, 2025

shangm2 changed the title ~~fix: info and infoUnion should always be synced.~~ fix: Info and infoUnion should always be synced. Nov 26, 2025

shangm2 changed the title ~~fix: Info and infoUnion should always be synced.~~ fix: Info and infoUnion should always be synced Nov 26, 2025

mkarrmann reviewed Nov 26, 2025

View reviewed changes

presto-main-base/src/main/java/com/facebook/presto/operator/OperatorInfoUnion.java Outdated Show resolved Hide resolved

shangm2 force-pushed the thrift_4_batch branch from 27014b8 to 392c0e2 Compare November 26, 2025 23:48

fix: Info and infoUnion should always be synced

d238477

shangm2 force-pushed the thrift_4_batch branch from 392c0e2 to d238477 Compare November 26, 2025 23:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: Info and infoUnion should always be synced #26707

fix: Info and infoUnion should always be synced #26707

shangm2 commented Nov 26, 2025 •

edited

Loading

Uh oh!

sourcery-ai bot commented Nov 26, 2025 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fix: Info and infoUnion should always be synced #26707

Are you sure you want to change the base?

fix: Info and infoUnion should always be synced #26707

Conversation

shangm2 commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

Impact

Test Plan

Contributor checklist

Release Notes

Uh oh!

sourcery-ai bot commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Sequence diagram for recording metrics with percentile tracking and computing percentiles

Sequence diagram for QueryMonitor resolving TableFinishInfo from JSON and Thrift union

Class diagram for updated RuntimeMetric and RuntimeStats percentile tracking

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

shangm2 commented Nov 26, 2025 •

edited

Loading

sourcery-ai bot commented Nov 26, 2025 •

edited

Loading