Skip to content

Conversation

@shangm2
Copy link
Contributor

@shangm2 shangm2 commented Nov 26, 2025

Description

  1. we saw a case when thrift is enabled, deserializer will only populate infounion but not info. And when the operatestats needs to be serialized to json for event logging, info will be empty while infounion, even though not empty, is not part of json serde. So we should keep info and infounion always synced in case different serde is being used in the system.
  2. S594718

Motivation and Context

  1. we need info and infounion to be synced

Impact

Test Plan

passed verifier run

Contributor checklist

  • Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.
  • If adding new dependencies, verified they have an OpenSSF Scorecard score of 5.0 or higher (or obtained explicit TSC approval for lower scores).

Release Notes

== NO RELEASE NOTE ==

@shangm2 shangm2 requested review from a team and elharo as code owners November 26, 2025 22:02
@prestodb-ci prestodb-ci added the from:Meta PR from Meta label Nov 26, 2025
@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Nov 26, 2025

Reviewer's Guide

Adds histogram-based percentile tracking (p90/p95/p99) to RuntimeMetric with JSON-serializable configuration, updates RuntimeStats and StageExecutionStateMachine to optionally track percentiles for selected metrics, extends QueryMonitor to read TableFinishInfo from both JSON and Thrift unions, and introduces comprehensive tests for percentile behavior and merging semantics.

Sequence diagram for recording metrics with percentile tracking and computing percentiles

sequenceDiagram
    actor Coordinator
    participant StageExecutionStateMachine as StageExecutionStateMachine
    participant RuntimeStats as RuntimeStats
    participant RuntimeMetric as RuntimeMetric

    Coordinator->>StageExecutionStateMachine: recordStartWaitForEventLoop(nanos)
    StageExecutionStateMachine->>RuntimeStats: addMetricValue(TASK_START_WAIT_FOR_EVENT_LOOP, NANO, max(nanos,0), true)
    alt metric_exists
        RuntimeStats->>RuntimeMetric: addValue(value)
        RuntimeMetric->>RuntimeMetric: update sum/count/min/max
        RuntimeMetric->>RuntimeMetric: getBucketIndex(value)
        RuntimeMetric->>RuntimeMetric: histogramBuckets.incrementAndGet(bucketIndex)
    else metric_missing
        RuntimeStats->>RuntimeStats: create new RuntimeMetric(name, unit, trackPercentiles=true)
        RuntimeStats->>RuntimeMetric: addValue(value)
        RuntimeMetric->>RuntimeMetric: update sum/count/min/max
        RuntimeMetric->>RuntimeMetric: getBucketIndex(value)
        RuntimeMetric->>RuntimeMetric: histogramBuckets.incrementAndGet(bucketIndex)
    end

    Coordinator->>RuntimeStats: computeAllPercentiles()
    loop for_each_metric
        RuntimeStats->>RuntimeMetric: isPercentileTrackingEnabled()
        alt tracking_enabled
            RuntimeStats->>RuntimeMetric: getP90()
            RuntimeMetric->>RuntimeMetric: computePercentile(0.90)
            RuntimeMetric-->>RuntimeStats: p90
            RuntimeStats->>RuntimeMetric: getP95()
            RuntimeMetric->>RuntimeMetric: computePercentile(0.95)
            RuntimeMetric-->>RuntimeStats: p95
            RuntimeStats->>RuntimeMetric: getP99()
            RuntimeMetric->>RuntimeMetric: computePercentile(0.99)
            RuntimeMetric-->>RuntimeStats: p99
        else tracking_disabled
            RuntimeStats-->>RuntimeStats: skip_metric
        end
    end
Loading

Sequence diagram for QueryMonitor resolving TableFinishInfo from JSON and Thrift union

sequenceDiagram
    participant QueryMonitor as QueryMonitor
    participant QueryInfo as QueryInfo
    participant QueryStats as QueryStats
    participant OperatorStats as OperatorStats
    participant OperatorInfoUnion as OperatorInfoUnion
    participant TableFinishInfo as TableFinishInfo

    QueryMonitor->>QueryInfo: getOutput()
    alt output_present
        QueryMonitor->>QueryInfo: getQueryStats()
        QueryInfo-->>QueryMonitor: QueryStats
        QueryMonitor->>QueryStats: getOperatorSummaries()
        QueryStats-->>QueryMonitor: List~OperatorStats~
        loop operator_summaries
            QueryMonitor->>OperatorStats: getInfo()
            alt info_is_TableFinishInfo
                OperatorStats-->>QueryMonitor: TableFinishInfo
                QueryMonitor-->>TableFinishInfo: use_for_QueryOutputMetadata
            else info_not_TableFinishInfo
                OperatorStats-->>QueryMonitor: OperatorInfo
                QueryMonitor->>OperatorStats: getInfoUnion()
                alt infoUnion_not_null
                    OperatorStats-->>QueryMonitor: OperatorInfoUnion
                    QueryMonitor->>OperatorInfoUnion: getTableFinishInfo()
                    alt union_has_TableFinishInfo
                        OperatorInfoUnion-->>QueryMonitor: TableFinishInfo
                        QueryMonitor-->>TableFinishInfo: use_for_QueryOutputMetadata
                    else union_missing_TableFinishInfo
                        OperatorInfoUnion-->>QueryMonitor: null
                    end
                else infoUnion_null
                    OperatorStats-->>QueryMonitor: null
                end
            end
        end
    else output_absent
        QueryInfo-->>QueryMonitor: Optional.empty
        QueryMonitor-->>QueryMonitor: no_QueryOutputMetadata
    end
Loading

Class diagram for updated RuntimeMetric and RuntimeStats percentile tracking

classDiagram
    class RuntimeMetric {
        - static int DEFAULT_NUM_BUCKETS
        - int numBuckets
        - long bucketWidth
        - String name
        - RuntimeUnit unit
        - AtomicLong sum
        - AtomicLong count
        - AtomicLong max
        - AtomicLong min
        - boolean percentileTrackingEnabled
        - AtomicLongArray histogramBuckets
        - Long p90
        - Long p95
        - Long p99
        + RuntimeMetric(name String, unit RuntimeUnit)
        + RuntimeMetric(name String, unit RuntimeUnit, trackPercentiles boolean)
        + RuntimeMetric(name String, unit RuntimeUnit, trackPercentiles boolean, bucketWidth long)
        + RuntimeMetric(name String, unit RuntimeUnit, trackPercentiles boolean, bucketWidth long, numBuckets int)
        + RuntimeMetric(name String, unit RuntimeUnit, sum long, count long, max long, min long)
        + RuntimeMetric(name String, unit RuntimeUnit, sum long, count long, max long, min long, numBuckets Integer, bucketWidth Long, p90 Long, p95 Long, p99 Long)
        + static RuntimeMetric copyOf(metric RuntimeMetric) RuntimeMetric
        - static long determineBucketWidth(unit RuntimeUnit) long
        - void set(sum long, count long, max long, min long)
        + void set(metric RuntimeMetric)
        + boolean isPercentileTrackingEnabled()
        + String getName()
        + void addValue(value long)
        - int getBucketIndex(value long) int
        + void mergeWith(metric RuntimeMetric)
        + long getSum()
        + long getCount()
        + long getMax()
        + long getMin()
        + RuntimeUnit getUnit()
        + Integer getNumBuckets()
        + Long getBucketWidth()
        + Long getP90()
        + Long getP95()
        + Long getP99()
        - long computePercentile(percentile double) long
        - static void checkState(condition boolean, message String)
        + String toString()
    }

    class RuntimeStats {
        - Map~String, RuntimeMetric~ metrics
        + RuntimeStats()
        + Map~String, RuntimeMetric~ getMetrics()
        + void addMetricValue(name String, unit RuntimeUnit, value long)
        + void addMetricValue(name String, unit RuntimeUnit, value long, trackPercentiles boolean)
        + void addMetricValueIgnoreZero(name String, unit RuntimeUnit, value long)
        + void addMetricValueIgnoreZero(name String, unit RuntimeUnit, value long, trackPercentiles boolean)
        + void addMetric(name String, metric RuntimeMetric)
        + void mergeWith(other RuntimeStats)
        + void recordWallTime(tag String, runnable Runnable)
        + void recordCpuTime(tag String, supplier Supplier~Object~)
        + void recordWallAndCpuTime(tag String, supplier Supplier~Object~)
        + void computeAllPercentiles()
    }

    class StageExecutionStateMachine {
        - RuntimeStats runtimeStats
        + void recordStartWaitForEventLoop(nanos long)
        + void recordTaskUpdateDeliveredTime(nanos long)
        + void recordDeliveredUpdates(updates int)
    }

    RuntimeStats --> RuntimeMetric : uses
    StageExecutionStateMachine --> RuntimeStats : uses
    StageExecutionStateMachine ..> RuntimeMetric : records_metrics_with_percentiles
Loading

File-Level Changes

Change Details Files
Introduce configurable histogram-based percentile tracking (p90/p95/p99) into RuntimeMetric with JSON compatibility and safe merging/copy semantics.
  • Add AtomicLongArray-backed histogram, bucket configuration (numBuckets, bucketWidth), and percentile-tracking flag to RuntimeMetric.
  • Provide multiple constructors to enable/disable percentile tracking and to configure bucket width and bucket count, including auto-configuration based on RuntimeUnit.
  • Implement addValue, mergeWith, set, copyOf, and toString logic that respects histogram configuration, enforces unit/bucket compatibility, and caches percentiles.
  • Add JSON-creator constructor and @JsonProperty accessors for numBuckets, bucketWidth, p90, p95, and p99 so snapshots can be serialized/deserialized without histograms.
  • Implement computePercentile and helper methods (getBucketIndex, determineBucketWidth, checkState) with clamping and overflow handling for approximate percentiles.
presto-common/src/main/java/com/facebook/presto/common/RuntimeMetric.java
Extend RuntimeStats API to support percentile-enabled metrics and to precompute all percentiles for serialization/consumers.
  • Add overloaded addMetricValue that takes a trackPercentiles flag and constructs RuntimeMetric accordingly.
  • Add computeAllPercentiles helper that forces computation and caching of p90/p95/p99 on all percentile-enabled metrics.
presto-common/src/main/java/com/facebook/presto/common/RuntimeStats.java
Enable percentile tracking for task start wait-for-event-loop metric at stage level.
  • Change StageExecutionStateMachine.recordStartWaitForEventLoop to call the new RuntimeStats.addMetricValue overload with trackPercentiles set to true for TASK_START_WAIT_FOR_EVENT_LOOP.
presto-main-base/src/main/java/com/facebook/presto/execution/StageExecutionStateMachine.java
Make QueryMonitor resilient to TableFinishInfo coming either from JSON-serialized OperatorInfo or Thrift OperatorInfoUnion, so written partition logging works for thrift serde.
  • Update selection of TableFinishInfo to first check OperatorStats.info (JSON) and then OperatorStats.infoUnion.getTableFinishInfo() when available, filtering out nulls.
  • Adjust getQueryIOMetadata output construction accordingly so it works for both serialization paths.
presto-main-base/src/main/java/com/facebook/presto/event/QueryMonitor.java
Add extensive unit tests covering percentile tracking behavior, configuration, JSON serialization, merging, and edge cases for RuntimeMetric.
  • Add tests to verify percentiles are disabled by default, enabled via constructor, and correctly computed for typical distributions and units (NANO, BYTE, NONE).
  • Add tests for JSON serialization/deserialization of percentile snapshots, copyOf behavior, and merging semantics including mismatched units, bucket widths, and bucket counts.
  • Add tests for various edge cases: negative values, overflow buckets, skewed distributions, small sample sizes, single values, all values in same bucket, zero values, and large counts/values.
  • Add tests that validate interaction between percentile tracking flags (both disabled, one enabled) and merge behavior, including cached percentile stability after copy.
presto-common/src/test/java/com/facebook/presto/common/TestRuntimeMetric.java

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

  • In RuntimeMetric.set(RuntimeMetric) you now enforce same bucketWidth/numBuckets even when percentile tracking is effectively unused, which may be unnecessarily strict and could break existing call sites that only care about the basic aggregates; consider gating those checks on both metrics having percentile tracking enabled.
  • The new RuntimeStats.addMetricValue(name, unit, value, trackPercentiles) ignores trackPercentiles after the first insertion because of computeIfAbsent; if a caller ever expects to turn percentile tracking on for an existing metric, that will silently fail, so it might be worth either documenting or enforcing that the flag is consistent per name.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `RuntimeMetric.set(RuntimeMetric)` you now enforce same bucketWidth/numBuckets even when percentile tracking is effectively unused, which may be unnecessarily strict and could break existing call sites that only care about the basic aggregates; consider gating those checks on both metrics having percentile tracking enabled.
- The new `RuntimeStats.addMetricValue(name, unit, value, trackPercentiles)` ignores `trackPercentiles` after the first insertion because of `computeIfAbsent`; if a caller ever expects to turn percentile tracking on for an existing metric, that will silently fail, so it might be worth either documenting or enforcing that the flag is consistent per name.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@shangm2 shangm2 changed the title fix: TableFinishInfo is part of union in thrift fix: info and infoUnion should always be synced. Nov 26, 2025
@shangm2 shangm2 changed the title fix: info and infoUnion should always be synced. fix: Info and infoUnion should always be synced. Nov 26, 2025
@shangm2 shangm2 changed the title fix: Info and infoUnion should always be synced. fix: Info and infoUnion should always be synced Nov 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

from:Meta PR from Meta

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants