Skip to content

ci: classify Reyden nightly E2E failures by root cause#512

Merged
eric-wang-1990 merged 1 commit into
ci/reyden-rest-nightlyfrom
reyden-classification-update
Jun 5, 2026
Merged

ci: classify Reyden nightly E2E failures by root cause#512
eric-wang-1990 merged 1 commit into
ci/reyden-rest-nightlyfrom
reyden-classification-update

Conversation

@eric-wang-1990

Copy link
Copy Markdown
Collaborator

What

The raw pass-rate on the SEA/Reyden nightly (~61% on the latest run) is dominated by expected failures — unsupported DDL/types, no Thrift endpoint on a SEA-only warehouse, no CloudFetch — which buries the genuine driver bugs. This adds a coarse root-cause category on top of the existing per-failure signature so the dashboard separates expected noise from the real backlog.

On the latest run (127 failures) the split is 51 expected Reyden gaps / 76 real issues.

How it classifies

Classification follows the failing step, which the message already encodes:

  • A test with a CREATE TABLE/SCHEMA step Reyden can't run fails at that step with an Unsupported … message → Reyden capability gap (expected).
  • A value/cast mismatch means setup succeeded and the INSERT→SELECT→DELETE round-trip returned wrong data → Real issue (e.g. a SEA-path serialization difference).
  • Missing warehouse / read-only / auth / timeout / transport → Environment / infra.

Changes

  • parse-trx-to-json.py: refined signature_for() (Thrift-on-SEA, CloudFetch, unsupported-feature buckets; PARSE_SYNTAX_ERROR ordered before the broad assertion bucket), added category_for() + per-failure category + a by_category rollup.
  • update-e2e-dashboard.py: propagate by_category into the runs.json summary row.
  • index.html: "By root-cause category" rollup with a color-coded legend, and failure detail grouped by category → signature. Degrades gracefully for older runs without by_category.

Validated against the latest run's data and syntax-checked (py_compile + node --check).

This pull request and its description were written by Isaac.

The raw pass-rate on the SEA/Reyden nightly is dominated by expected failures
(unsupported DDL/types, no Thrift endpoint, no CloudFetch), which hides the
genuine driver bugs. Layer a coarse root-cause category on top of the existing
per-failure signature so the dashboard separates the two.

parse-trx-to-json.py:
- Refine signature_for() with Reyden-specific buckets: Thrift-on-SEA endpoint
  (ENDPOINT_NOT_FOUND + "Thrift server error"), CloudFetch download failure,
  and unsupported DDL/statement/type. Reorder so PARSE_SYNTAX_ERROR is matched
  before the broad assertion bucket (whose "expected" token also matches
  "Expected identifier" syntax errors).
- Add category_for() with three buckets and emit per-failure "category" plus a
  "by_category" rollup. Classification follows the failing step encoded in the
  message: a rejected CREATE surfaces as an "Unsupported ..." gap, whereas a
  value/cast mismatch means setup succeeded and the round-trip returned wrong
  data -- a real driver bug.

update-e2e-dashboard.py: propagate by_category into the runs.json summary row.

index.html: render a "By root-cause category" rollup with a color-coded legend,
and group the expanded failure detail by category then signature. Degrades
gracefully for older runs that predate by_category.

Co-authored-by: Isaac
@eric-wang-1990 eric-wang-1990 merged commit a039bae into ci/reyden-rest-nightly Jun 5, 2026
14 of 15 checks passed
@eric-wang-1990 eric-wang-1990 deleted the reyden-classification-update branch June 5, 2026 06:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant