Skip to content

refactor(udf): rewrite udf structure#253

Open
shinyano wants to merge 17 commits intoIGinX-THU:mainfrom
shinyano:refactor/udf-refactor
Open

refactor(udf): rewrite udf structure#253
shinyano wants to merge 17 commits intoIGinX-THU:mainfrom
shinyano:refactor/udf-refactor

Conversation

@shinyano
Copy link
Collaborator

@shinyano shinyano commented Feb 3, 2024

UDF python部分

为了使UDF支持参数列表形式,并简化用户编写UDF的步骤,设计编写了若干个UDF父类。用户UDF必须从中选择一个继承并实现所有抽象函数,UDF父类包括:

  • UDTF:row to row UDF的参数列表形式
  • UDTFinDF:row to row UDF的dataframe形式,用户使用时传递的所有列将组合为一个dataframe
  • UDAF:set to row UDF的参数列表形式
  • UDAFinDF:set to row UDF的dataframe形式
  • UDSF:set to set UDF,仅支持dataframe模式
  • UDF:以上所有抽象类的父类

UDF父类为IGinX提供transform接口作为调用入口,并期待用户实现eval函数作为udf的数据处理部分。
build_header函数可重写,用于指定返回数据的列名与数据类型(目前的实现比较粗糙)。

UDF 逻辑解析

为了实现列参数与常量参数混用,将IGinX解析层中function的三部分参数合并为两部分。原本分为列数据、常量参数、kv参数,现在将列数据与常量参数合并为统一的位置参数部分。

注册UDF时自动检测UDF类型(UDTF/UDAF/UDSF),无需用户手动指定。

SQL语句修改

在SQL中注册UDF时原先需要指定UDF是UDTF/UDAF/UDSF,统一改为UDF,并在初始化该UDF时进行类型检测。

例:
原本的注册语句:
REGISTER *UDAF* PYTHON TASK "UDAFinDFTest" IN "udf_funcs\\python_scripts\\udaf_df_test.py" AS "udaf_df_test";
改为:
REGISTER *UDF* PYTHON TASK "UDAFinDFTest" IN "udf_funcs\\python_scripts\\udaf_df_test.py" AS "udaf_df_test";
并由系统自动判断UDF的类型

不直接删除该部分是为了和transform任务进行区分。

To-do

  • 无法支持全常量参数,需要常量表达式功能的支持
  • 是否会是个问题?:udf中的常量参数不能是关键字,包括其他udf名字等

@shinyano shinyano closed this Feb 19, 2024
@shinyano shinyano reopened this Feb 19, 2024
@shinyano shinyano closed this Feb 19, 2024
@shinyano shinyano reopened this Feb 19, 2024
@shinyano shinyano changed the title refactor(udf): rewite udf structure refactor(udf): rewrite udf structure Feb 19, 2024
@shinyano shinyano closed this Feb 23, 2024
@shinyano shinyano reopened this Feb 23, 2024
@shinyano shinyano closed this Feb 23, 2024
@shinyano shinyano reopened this Feb 23, 2024
@shinyano shinyano closed this Feb 23, 2024
@shinyano shinyano reopened this Feb 23, 2024
@shinyano shinyano closed this Feb 23, 2024
@shinyano shinyano reopened this Feb 23, 2024
@shinyano shinyano closed this Feb 28, 2024
@shinyano shinyano reopened this Feb 28, 2024
Copy link
Member

@zhuyuqing zhuyuqing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

咱看看增加的执行时间

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants