refactor(udf): rewrite udf structure#253
Open
shinyano wants to merge 17 commits intoIGinX-THU:mainfrom
Open
Conversation
SolomonAnn
reviewed
Feb 26, 2024
test/src/test/java/cn/edu/tsinghua/iginx/integration/other/UDFPathIT.java
Outdated
Show resolved
Hide resolved
SolomonAnn
reviewed
Feb 26, 2024
SolomonAnn
reviewed
Feb 26, 2024
SolomonAnn
reviewed
Feb 26, 2024
core/src/main/java/cn/edu/tsinghua/iginx/engine/shared/function/manager/FunctionManager.java
Outdated
Show resolved
Hide resolved
SolomonAnn
approved these changes
Feb 28, 2024
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
UDF python部分
为了使UDF支持参数列表形式,并简化用户编写UDF的步骤,设计编写了若干个UDF父类。用户UDF必须从中选择一个继承并实现所有抽象函数,UDF父类包括:
UDF父类为IGinX提供transform接口作为调用入口,并期待用户实现eval函数作为udf的数据处理部分。
build_header函数可重写,用于指定返回数据的列名与数据类型(目前的实现比较粗糙)。
UDF 逻辑解析
为了实现列参数与常量参数混用,将IGinX解析层中function的三部分参数合并为两部分。原本分为列数据、常量参数、kv参数,现在将列数据与常量参数合并为统一的位置参数部分。
注册UDF时自动检测UDF类型(UDTF/UDAF/UDSF),无需用户手动指定。
SQL语句修改
在SQL中注册UDF时原先需要指定UDF是UDTF/UDAF/UDSF,统一改为UDF,并在初始化该UDF时进行类型检测。
例:
原本的注册语句:
REGISTER *UDAF* PYTHON TASK "UDAFinDFTest" IN "udf_funcs\\python_scripts\\udaf_df_test.py" AS "udaf_df_test";改为:
REGISTER *UDF* PYTHON TASK "UDAFinDFTest" IN "udf_funcs\\python_scripts\\udaf_df_test.py" AS "udaf_df_test";并由系统自动判断UDF的类型
不直接删除该部分是为了和transform任务进行区分。
To-do