Add support for schema-scoped table functions #18022
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Closes #18021.
Rationale for this change
Currently, table functions (UDTFs) can only be registered globally via
SessionContext::register_udtf()
. This means all table functions share a global namespace, which can lead to naming conflicts and makes it difficult to organize functions by schema.This PR adds support for registering table functions at the schema level, allowing users to organize table functions by schema while maintaining full backward compatibility with existing global registration.
What changes are included in this PR?
Extended
SchemaProvider
trait with 5 new methods for table function management:udtf_names()
- list table functions in the schemaudtf()
- retrieve a specific table functionregister_udtf()
- register a table function to the schemaderegister_udtf()
- remove a table function from the schemaudtf_exist()
- check if a table function existsImplemented table function storage in
MemorySchemaProvider
usingDashMap
for thread-safe concurrent accessUpdated
ContextProvider
lookup logic to support qualified table function names (e.g.,schema.function_name
orcatalog.schema.function_name
)Modified SQL parser to correctly handle qualified table function names in SQL queries
Backward Compatibility:
SELECT * FROM my_func()
) continue to resolve from the global registrySELECT * FROM public.my_func()
) look up functions in the specified schema onlyAre these changes tested?
Yes, comprehensive tests have been added:
MemorySchemaProvider
covering registration, deregistration, retrieval, and error casesAll existing tests continue to pass (369/369).
Are there any user-facing changes?
New functionality (non-breaking):
register_udtf()
on aSchemaProvider
instanceSELECT * FROM myschema.my_function(args)
Backward compatibility:
ctx.register_udtf()
continues to work unchangedExample usage:
Open Questions
1. SchemaProvider API Design
The current implementation adds 5 new methods directly to the
SchemaProvider
trait:udtf_names()
,udtf()
,register_udtf()
,deregister_udtf()
,udtf_exist()
Concern: As we add support for other UDF types (scalar UDFs, aggregate UDFs, window UDFs) to schemas, this approach will significantly expand the
SchemaProvider
interface, potentially making it noisy and harder to maintain.Alternative approach: Introduce a
UdfContainer
orFunctionRegistry
trait that manages all types of user-defined functions:Benefits:
SchemaProvider
interfaceTradeoffs:
Question: Should we refactor to use a
FunctionRegistry
pattern before merging, or is the current direct-method approach acceptable?2. Information Schema Exposure
Should schema-scoped table functions be exposed via
information_schema.routines
or a newinformation_schema.table_functions
table?Arguments for:
Arguments against:
Current state: Not implemented in this PR.
Question: Is information_schema support desired for the initial implementation, or can it be added in a follow-up PR?
Feedback on these design decisions would be appreciated before finalizing this PR.