-
Notifications
You must be signed in to change notification settings - Fork 29.2k
[SPARK-46625][SQL] Push WithCTE into CTEInChildren when materialised by ResolveIdentifierClause #55706
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
[SPARK-46625][SQL] Push WithCTE into CTEInChildren when materialised by ResolveIdentifierClause #55706
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -22,7 +22,7 @@ import java.time.{Instant, LocalDate, LocalDateTime, ZoneId} | |
| import org.apache.spark.sql.catalyst.ExtendedAnalysisException | ||
| import org.apache.spark.sql.catalyst.expressions.Literal | ||
| import org.apache.spark.sql.catalyst.parser.ParseException | ||
| import org.apache.spark.sql.catalyst.plans.logical.Limit | ||
| import org.apache.spark.sql.catalyst.plans.logical.{CTEInChildren, Limit, WithCTE} | ||
| import org.apache.spark.sql.catalyst.trees.SQLQueryContext | ||
| import org.apache.spark.sql.catalyst.util.CharVarcharUtils | ||
| import org.apache.spark.sql.functions.{array, call_function, lit, map, map_from_arrays, map_from_entries, str_to_map, struct} | ||
|
|
@@ -2460,4 +2460,53 @@ class ParametersSuite extends SharedSparkSession { | |
| spark.sql("SELECT 1", Array.empty[Any]), | ||
| Row(1)) | ||
| } | ||
|
|
||
| test("WITH ... INSERT OVERWRITE TABLE IDENTIFIER(:p) SELECT ... FROM cte") { | ||
| withTable("t_cte_overwrite") { | ||
| sql("CREATE TABLE t_cte_overwrite (a INT) USING PARQUET") | ||
| sql("INSERT INTO t_cte_overwrite VALUES (10)") | ||
| spark.sql( | ||
| """WITH transformation AS (SELECT 1 AS a) | ||
| |INSERT OVERWRITE TABLE IDENTIFIER(:tname) | ||
| |SELECT * FROM transformation""".stripMargin, | ||
| Map("tname" -> "t_cte_overwrite")) | ||
| checkAnswer(spark.table("t_cte_overwrite"), Row(1)) | ||
| } | ||
| } | ||
|
|
||
| test("WITH ... INSERT INTO IDENTIFIER(:p) SELECT ... FROM cte") { | ||
| withTable("t_cte_into") { | ||
| sql("CREATE TABLE t_cte_into (a INT) USING PARQUET") | ||
| spark.sql( | ||
| """WITH transformation AS (SELECT 7 AS a) | ||
| |INSERT INTO IDENTIFIER(:tname) | ||
| |SELECT * FROM transformation""".stripMargin, | ||
| Map("tname" -> "t_cte_into")) | ||
| checkAnswer(spark.table("t_cte_into"), Row(7)) | ||
| } | ||
| } | ||
|
|
||
| test("Analyzed plan does not leave WithCTE wrapping a CTEInChildren " + | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Once the fix moves into the parser per the comment on |
||
| "when IDENTIFIER(:p) is the INSERT target") { | ||
| // After analysis, the WithCTE must be pushed into the InsertIntoStatement's query child | ||
| // (CTEInChildren placement), not left wrapping the command. The wrapped shape produces an | ||
| // orphan CTERelationRef whose CTERelationDef is in a now-detached WithCTE; any downstream | ||
| // pass that re-analyses the subtree below the command then trips InlineCTE.buildCTEMap with | ||
| // NoSuchElementException: key not found. | ||
| withTable("t_cte_shape") { | ||
| sql("CREATE TABLE t_cte_shape (a INT) USING PARQUET") | ||
| val df = spark.sql( | ||
| """WITH transformation AS (SELECT 1 AS a) | ||
| |INSERT INTO IDENTIFIER(:tname) | ||
| |SELECT * FROM transformation""".stripMargin, | ||
| Map("tname" -> "t_cte_shape")) | ||
| val analyzed = df.queryExecution.analyzed | ||
| analyzed match { | ||
| case WithCTE(_: CTEInChildren, _) => | ||
| fail(s"WithCTE must be pushed into the CTEInChildren's children, not left " + | ||
| s"wrapping the command. Analyzed plan:\n$analyzed") | ||
| case _ => // expected | ||
| } | ||
| } | ||
| } | ||
| } | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This patches the symptom rather than the root cause:
withIdentClauseliftsPlanWithUnresolvedIdentifierabove the whole write/CTAS command, and we then have to undo the placement after materialization. Please change the parser to push the placeholder into the identifier slot of the produced plan (e.g.InsertIntoStatement.table,CreateTableAsSelect.name) instead of wrapping the entire command, and add a parallel handler in this rule — e.g.:Then
CTESubstitutionsees the actualCTEInChildrenfrom the start and placesWithCTEcorrectly — no post-hoc collapse needed, and the invariant is preserved by construction. The same shape applies to allwithIdentClausecall sites whose builder produces aCTEInChildren(INSERT, CTAS, RTAS,CACHE TABLE AS—AstBuilder.scala:911-1002, 5640, 5724, 6502). Downstream matchers likecase InsertIntoStatement(LogicalRelationWithTable(_), ...)aren't affected because they run after this rule, by which pointtableis back to a normal resolved relation.