ci: classify Reyden nightly E2E failures by root cause#512
Merged
eric-wang-1990 merged 1 commit intoJun 5, 2026
Conversation
The raw pass-rate on the SEA/Reyden nightly is dominated by expected failures (unsupported DDL/types, no Thrift endpoint, no CloudFetch), which hides the genuine driver bugs. Layer a coarse root-cause category on top of the existing per-failure signature so the dashboard separates the two. parse-trx-to-json.py: - Refine signature_for() with Reyden-specific buckets: Thrift-on-SEA endpoint (ENDPOINT_NOT_FOUND + "Thrift server error"), CloudFetch download failure, and unsupported DDL/statement/type. Reorder so PARSE_SYNTAX_ERROR is matched before the broad assertion bucket (whose "expected" token also matches "Expected identifier" syntax errors). - Add category_for() with three buckets and emit per-failure "category" plus a "by_category" rollup. Classification follows the failing step encoded in the message: a rejected CREATE surfaces as an "Unsupported ..." gap, whereas a value/cast mismatch means setup succeeded and the round-trip returned wrong data -- a real driver bug. update-e2e-dashboard.py: propagate by_category into the runs.json summary row. index.html: render a "By root-cause category" rollup with a color-coded legend, and group the expanded failure detail by category then signature. Degrades gracefully for older runs that predate by_category. Co-authored-by: Isaac
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
The raw pass-rate on the SEA/Reyden nightly (~61% on the latest run) is dominated by expected failures — unsupported DDL/types, no Thrift endpoint on a SEA-only warehouse, no CloudFetch — which buries the genuine driver bugs. This adds a coarse root-cause category on top of the existing per-failure signature so the dashboard separates expected noise from the real backlog.
On the latest run (127 failures) the split is 51 expected Reyden gaps / 76 real issues.
How it classifies
Classification follows the failing step, which the message already encodes:
CREATE TABLE/SCHEMAstep Reyden can't run fails at that step with anUnsupported …message → Reyden capability gap (expected).INSERT→SELECT→DELETEround-trip returned wrong data → Real issue (e.g. a SEA-path serialization difference).Changes
parse-trx-to-json.py: refinedsignature_for()(Thrift-on-SEA, CloudFetch, unsupported-feature buckets;PARSE_SYNTAX_ERRORordered before the broad assertion bucket), addedcategory_for()+ per-failurecategory+ aby_categoryrollup.update-e2e-dashboard.py: propagateby_categoryinto theruns.jsonsummary row.index.html: "By root-cause category" rollup with a color-coded legend, and failure detail grouped by category → signature. Degrades gracefully for older runs withoutby_category.Validated against the latest run's data and syntax-checked (
py_compile+node --check).This pull request and its description were written by Isaac.