You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
UDF Server is a type of UDF, which can run independently on any server and be accessed by databases through network connections. The security and isolation of UDF Server are relatively high, because problems with UDF Server will not affect the normal execution of the database.
At present, Databend has not yet implemented UDF Server, and only provides lambda UDF support. Therefore, we intend to implement Databend UDF Server, allowing users to create, delete and call UDF. After it is implemented, our users will be able to use UDF conveniently, further improving the flexibility and scalability of Databend, and making data analysis and processing more efficient and convenient.
Guide-level explanation
You can define your own functions and call these functions in Databend.
define your functions and start the UDF server (using python as example)
fromdatabend.udfimportudf, UdfServer# Define a scalar function@udf(input_types=['INT', 'INT'], result_type='INT')defplus_int(x, y):
returnx+y# Start a UDF serverif__name__=='__main__':
server=UdfServer(location="0.0.0.0:8888")
server.add_function(plus_int)
server.serve()
Extending the syntax and functionality of the CREATE FUNCTION to support creating a UDF Server. Extend the ALTER FUNCTION and DELETE FUNCTION to support UDF Server modification and deletion.
The syntax of CREATE FUNCTION command for creating UDF Server is as follows:
CREATE FUNCTION [ IF NOT EXISTS ] function_name AS ( [argument_type] ) ->
return_type address 'udf_server_address'
Like lambda UDF, the information of UDF Server also need to be stored in meta. The related struct used to store UDF information is defined as follows:
The call of UDF Server is very similar to the normal function call. In the bind stage, we will resolve the call to UDF Server into a ScalarExpr::UDFServerCall, whose structure is defined as follows:
In the evaluate stage, packing the parameters required by this function into a RecordBatch , and send the RecordBatch to the UDF server through Arrow Flight DoExchange RPC, and wait for the UDF server to return the calculation result.
Language-specific UDF Server(using python as example)
Python UDF Server runs as a Arrow Flight Server, it receives the request from the Databend, parses parameters and call the corresponding function to perform the calculation, then return the result of the calculation. Python UDF Server should implement two rpc: GetFlightInfo and DoExchange, used to return the input and output types of the function and call the user defined function to perform calculations.
We will provide users with a Python SDK to help them write UDF Server.
Unresolved questions
The evaluate of UDF is synchronous, should it be evaluated asynchronously?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Summary
Implement Databend UDF Server.
Motivation
UDF Server is a type of UDF, which can run independently on any server and be accessed by databases through network connections. The security and isolation of UDF Server are relatively high, because problems with UDF Server will not affect the normal execution of the database.
At present, Databend has not yet implemented UDF Server, and only provides lambda UDF support. Therefore, we intend to implement Databend UDF Server, allowing users to create, delete and call UDF. After it is implemented, our users will be able to use UDF conveniently, further improving the flexibility and scalability of Databend, and making data analysis and processing more efficient and convenient.
Guide-level explanation
You can define your own functions and call these functions in Databend.
define your functions and start the UDF server (using python as example)
Add the UDF Server into Databend
Call these functions in Databend
Reference-level explanation
The management of UDF Server
Extending the syntax and functionality of the
CREATE FUNCTION
to support creating a UDF Server. Extend theALTER FUNCTION
andDELETE FUNCTION
to support UDF Server modification and deletion.The syntax of
CREATE FUNCTION
command for creating UDF Server is as follows:Like lambda UDF, the information of UDF Server also need to be stored in meta. The related struct used to store UDF information is defined as follows:
The resolution and execution of UDF Server
The call of UDF Server is very similar to the normal function call. In the bind stage, we will resolve the call to UDF Server into a
ScalarExpr::UDFServerCall
, whose structure is defined as follows:There are some other related exprs, we can also add a new enum to them:
In the evaluate stage, packing the parameters required by this function into a
RecordBatch
, and send theRecordBatch
to the UDF server through Arrow FlightDoExchange
RPC, and wait for the UDF server to return the calculation result.Language-specific UDF Server(using python as example)
Python UDF Server runs as a Arrow Flight Server, it receives the request from the Databend, parses parameters and call the corresponding function to perform the calculation, then return the result of the calculation. Python UDF Server should implement two rpc:
GetFlightInfo
andDoExchange
, used to return the input and output types of the function and call the user defined function to perform calculations.We will provide users with a Python SDK to help them write UDF Server.
Unresolved questions
Future possibilities
Support user defined table function like:
Reference
RisingWave UDF
Beta Was this translation helpful? Give feedback.
All reactions