Skip to content

[CORE] Deduplicate fallback reason when merging Appendable tags#12049

Open
liujiayi771 wants to merge 2 commits intoapache:mainfrom
liujiayi771:fallback-reason-dedup
Open

[CORE] Deduplicate fallback reason when merging Appendable tags#12049
liujiayi771 wants to merge 2 commits intoapache:mainfrom
liujiayi771:fallback-reason-dedup

Conversation

@liujiayi771
Copy link
Copy Markdown
Contributor

@liujiayi771 liujiayi771 commented May 7, 2026

Summary

When AQE re-runs columnar rules across stages, GlutenFallbackReporter repeatedly calls
FallbackTags.add on the same shared logicalLink with the same reason. The previous merge
logic in FallbackTags.add unconditionally concatenated Appendable reasons, producing strings
like "r; r; r; ..." that grow with every AQE iteration. This is especially noticeable when users
manually fall back specific node types and the same fixed reason is added many times.

The relevant comment at GlutenFallbackReporter.scala:66-71 already explains the trigger:

With in next round stage in AQE, the physical plan would be a new instance that can not preserve
the tag, so we need to set the fallback reason to logical plan ... If a logical plan mapping to
several physical plan, we add all reason into that logical plan to make sure we do not lose any
fallback reason.

This PR makes FallbackTag.Appendable store fallback reasons in a HashSet[String] and merge
Appendable tags with set union. reason() joins the set only when the reason text is reported,
so duplicate reasons are removed precisely without relying on substring checks.

The reason order is not significant here; the important behavior is that the reported text contains
each distinct fallback reason once.

Drive-by: stabilize FallbackSuite

While testing, the existing test no fallback event emitted for vanilla Spark execution with gluten disabled (added in #12027) turned out to be flaky. Spark's LiveListenerBus dispatches events
asynchronously, so GlutenPlanFallbackEvents posted by the previous test in the suite can still be
delivered to the listener registered by the next test, contaminating its events buffer. The dedup
change subtly shifted CPU/scheduling timing and exposed it.

Fix is bundled in this PR since it's directly triggered by these changes:

  • Added GlutenSuiteUtils.withFallbackEventListener that drains the bus before registering the
    listener and removes it afterwards.
  • All three event-listener tests in FallbackSuite, which previously duplicated the same listener
    boilerplate, now go through this helper.

Test plan

  • Added FallbackTagSuite in gluten-core covering:
    • repeated add of the same reason does not grow the reason string
    • distinct Appendable reasons are preserved
    • reasons with substring overlap are still treated as distinct reasons
  • Verified the substring-overlap test fails with the previous contains-based implementation
    and passes after switching Appendable to HashSet-based dedup.
  • mvn -pl gluten-core -DwildcardSuites=org.apache.gluten.extension.columnar.FallbackTagSuite test
  • FallbackSuite continues to pass with the new helper; the previously flaky test is now stable
    across reruns.

@github-actions github-actions Bot added the CORE works for Gluten Core label May 7, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 7, 2026

Run Gluten Clickhouse CI on x86

@liujiayi771
Copy link
Copy Markdown
Contributor Author

liujiayi771 commented May 7, 2026

before:
image

after:
image

@github-actions github-actions Bot added the VELOX label May 7, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 7, 2026

Run Gluten Clickhouse CI on x86

@liujiayi771 liujiayi771 force-pushed the fallback-reason-dedup branch from 37f9100 to 5dde763 Compare May 7, 2026 07:46
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 7, 2026

Run Gluten Clickhouse CI on x86

@liujiayi771 liujiayi771 force-pushed the fallback-reason-dedup branch from 5dde763 to 320d769 Compare May 7, 2026 14:14
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 7, 2026

Run Gluten Clickhouse CI on x86

Comment on lines +96 to +102
if (l.reason == r.reason || l.reason.contains(r.reason)) {
l
} else if (r.reason.contains(l.reason)) {
r
} else {
FallbackTag.Appendable(s"${l.reason}; ${r.reason}")
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about storing a hash set in FallbackTag.Appendable? Could be faster and preciser.

When AQE re-runs columnar rules across stages, GlutenFallbackReporter
repeatedly calls FallbackTags.add on the same shared logicalLink with
the same reason. The previous merge logic unconditionally concatenated
the reasons, producing strings like "r; r; r; ..." that grew with every
AQE iteration — especially noticeable when users manually fall back
specific node types.

Skip the concat when one reason already contains the other.
…ckSuite

Prior tests in the suite run gluten-enabled queries that post
GlutenPlanFallbackEvent. Spark's LiveListenerBus dispatches events
asynchronously, so events queued by earlier tests can still be delivered
to a listener registered afterwards, contaminating the events buffer of
the next test.

Add a withFallbackEventListener helper in GlutenSuiteUtils that drains
the bus before registering the listener and removes it afterwards. The
three event-listener tests in FallbackSuite share the same boilerplate
and now go through this helper.
@liujiayi771 liujiayi771 force-pushed the fallback-reason-dedup branch from 320d769 to 492e252 Compare May 9, 2026 09:53
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 9, 2026

Run Gluten Clickhouse CI on x86

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CORE works for Gluten Core VELOX

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants