Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-51439][SQL] Support SQL UDF with DEFAULT argument #50408

Open
wants to merge 16 commits into
base: master
Choose a base branch
from

Conversation

wengh
Copy link
Contributor

@wengh wengh commented Mar 26, 2025

Continuing @allisonwang-db's work on #50373 and #49471

What changes were proposed in this pull request?

This PR adds support for DEFAULT arguments in SQL UDF. Examples:

CREATE FUNCTION foo1d1(a INT DEFAULT 10) RETURNS INT RETURN a;
SELECT foo1d1();   -- 10
SELECT foo1d1(20); -- 20

CREATE FUNCTION foo1d6(a INT, b INT DEFAULT 7) RETURNS TABLE(a INT, b INT) RETURN SELECT a, b;
SELECT * FROM foo1d6(5);    -- 5, 7
SELECT * FROM foo1d6(5, 2); -- 5, 2

See sql-udf.sql for more valid and invalid examples.

Why are the changes needed?

To support default arguments in SQL UDFs.

Does this PR introduce any user-facing change?

Yes. Now SQL UDFs support DEFAULT arguments.

A side effect of the grammar change is that some invalid function parameter definitions are now no longer rejected by the grammar, but instead rejected by the parser logic.

Examples:

-- multiple COMMENT or multiple NOT NULL
CREATE TEMPORARY FUNCTION foo(a INT COMMENT 'hello' COMMENT 'world') RETURNS INT RETURN a;

-- before:
[PARSE_SYNTAX_ERROR] Syntax error at or near 'COMMENT'. SQLSTATE: 42601
== SQL (line 2, position 1) ==
CREATE TEMPORARY FUNCTION foo(a INT COMMENT 'hello' COMMENT 'world') RETURNS INT RETURN a;
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

-- after:
[CREATE_TABLE_COLUMN_DESCRIPTOR_DUPLICATE] CREATE TABLE column a specifies descriptor "COMMENT" more than once, which is invalid. SQLSTATE: 42710
== SQL (line 1, position 1) ==
CREATE TEMPORARY FUNCTION foo(a INT COMMENT 'hello' COMMENT 'world')...
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-- GENERATED ALWAYS AS
CREATE TEMPORARY FUNCTION foo(a INT GENERATED ALWAYS AS (1)) RETURNS INT RETURN a;

-- before:
[PARSE_SYNTAX_ERROR] Syntax error at or near 'GENERATED'. SQLSTATE: 42601
== SQL (line 2, position 1) ==
CREATE TEMPORARY FUNCTION foo(a INT GENERATED ALWAYS AS (1)) RETURNS INT RETURN a;
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

-- after:
[INVALID_SQL_SYNTAX.CREATE_FUNC_WITH_GENERATED_COLUMNS_AS_PARAMETERS] Invalid SQL syntax: CREATE FUNCTION with generated columns as parameters is not allowed. SQLSTATE: 42000
== SQL (line 2, position 1) ==
CREATE TEMPORARY FUNCTION foo(a INT GENERATED ALWAYS AS (1)) RETURNS INT RETURN a;
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

This doesn't change the behavior of existing valid SQL.

How was this patch tested?

End-to-end regression tests in sql-udf.sql and simple tests in SQLFunctionSuite.

Was this patch authored or co-authored using generative AI tooling?

No

@wengh wengh changed the title [WIP][SPARK-51439] Support SQL UDF with DEFAULT argument [WIP][SPARK-51439][SQL] Support SQL UDF with DEFAULT argument Mar 26, 2025
@wengh wengh marked this pull request as ready for review March 27, 2025 15:05
@wengh wengh changed the title [WIP][SPARK-51439][SQL] Support SQL UDF with DEFAULT argument [SPARK-51439][SQL] Support SQL UDF with DEFAULT argument Mar 27, 2025
@wengh
Copy link
Contributor Author

wengh commented Mar 27, 2025

@wengh wengh force-pushed the sql-udf-default branch from ecfb9ce to aee7338 Compare March 27, 2025 16:01
Copy link
Contributor

@allisonwang-db allisonwang-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

@wengh wengh marked this pull request as draft March 28, 2025 00:31
@wengh wengh force-pushed the sql-udf-default branch from 8f51e83 to 8c73a14 Compare March 28, 2025 16:48
@wengh wengh marked this pull request as ready for review March 28, 2025 17:13
Copy link
Contributor Author

@wengh wengh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

explain refactor

@@ -45,6 +45,70 @@ class DataTypeAstBuilder extends SqlBaseParserBaseVisitor[AnyRef] {
withOrigin(ctx)(StructType(visitColTypeList(ctx.colTypeList)))
}

override def visitSingleRoutineParamList(ctx: SingleRoutineParamListContext): StructType = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where do we fail this method if unsupported feature such as generated column is specified?

Copy link
Contributor Author

@wengh wengh Mar 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's checked in SparkSqlParser.visitCreateUserDefinedFunction. I should add comment in visitStructField to explain

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@wengh wengh requested review from cloud-fan and zhengruifeng April 1, 2025 00:22
* Create a [[StructField]] from a column definition which allows options like COMMENT and
* DEFAULT.
*
* Don't handle generation expression since this function is currently only used for creating
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why don't we reject it here within this function?

Copy link
Contributor Author

@wengh wengh Apr 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we reject it in SparkSqlParser to report the right ParserRuleContext

* SQL functions which don't support generation expressions. The rejection logic is in
* [[SqlBaseParserVisitor#visitCreateUserDefinedFunction()]] implementation.
*/
private def visitStructField(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method looks very similar to visitColDefinition. shall we invoke visitColDefinition here, fail if ColumnDefinition#generationExpression/identityColumnSpec is defined, and call ColumnDefinition#toV1Column to get StructField.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -50,6 +50,10 @@ abstract class AbstractParser extends DataTypeParserInterface with Logging {
astBuilder.visitSingleTableSchema(parser.singleTableSchema())
}

override def parseRoutineParam(sqlText: String): StructType = parse(sqlText) { parser =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so the main difference between this and parseTableSchema is: it supports default expression, and it fails for unsupported fields.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah

@wengh wengh requested a review from cloud-fan April 2, 2025 23:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants