Skip to content

Conversation

bubulalabu
Copy link

@bubulalabu bubulalabu commented Oct 12, 2025

Which issue does this PR close?

Closes #18021.

Rationale for this change

Currently, table functions (UDTFs) can only be registered globally via SessionContext::register_udtf(). This means all table functions share a global namespace, which can lead to naming conflicts and makes it difficult to organize functions by schema.

This PR adds support for registering table functions at the schema level, allowing users to organize table functions by schema while maintaining full backward compatibility with existing global registration.

What changes are included in this PR?

  1. Extended SchemaProvider trait with 5 new methods for table function management:

    • udtf_names() - list table functions in the schema
    • udtf() - retrieve a specific table function
    • register_udtf() - register a table function to the schema
    • deregister_udtf() - remove a table function from the schema
    • udtf_exist() - check if a table function exists
  2. Implemented table function storage in MemorySchemaProvider using DashMap for thread-safe concurrent access

  3. Updated ContextProvider lookup logic to support qualified table function names (e.g., schema.function_name or catalog.schema.function_name)

  4. Modified SQL parser to correctly handle qualified table function names in SQL queries

Backward Compatibility:

  • Unqualified names (e.g., SELECT * FROM my_func()) continue to resolve from the global registry
  • Qualified names (e.g., SELECT * FROM public.my_func()) look up functions in the specified schema only
  • All new trait methods have default implementations that maintain existing behavior
  • No breaking changes to existing APIs

Are these changes tested?

Yes, comprehensive tests have been added:

  • 5 unit tests for MemorySchemaProvider covering registration, deregistration, retrieval, and error cases
  • 3 Rust integration tests verifying:
    • Schema-qualified UDTF resolution
    • Global registry backward compatibility
    • Proper error handling for non-existent functions
  • 5 SQL logic tests validating:
    • Unqualified names resolve from global registry
    • Qualified names check schema only
    • Proper error messages for both cases

All existing tests continue to pass (369/369).

Are there any user-facing changes?

New functionality (non-breaking):

  • Users can now register table functions to specific schemas by calling register_udtf() on a SchemaProvider instance
  • Users can use qualified names in SQL to call schema-scoped table functions: SELECT * FROM myschema.my_function(args)

Backward compatibility:

  • Existing code using ctx.register_udtf() continues to work unchanged
  • Unqualified table function names in SQL continue to resolve from the global registry
  • No API changes that would break existing code

Example usage:

// Get schema and register UDTF to it
let schema = ctx.catalog("datacatalog")
    .unwrap()
    .schema("myschema")
    .unwrap();

let memory_schema = schema.as_any()
    .downcast_ref::<MemorySchemaProvider>()
    .unwrap();

memory_schema.register_udtf(
    "my_func".to_string(),
    Arc::new(TableFunction::new("my_func".to_string(), Arc::new(MyFunc)))
)?;

// Use it in SQL with qualified name
ctx.sql("SELECT * FROM myschema.my_func(1, 2, 3)").await?;

Open Questions

1. SchemaProvider API Design

The current implementation adds 5 new methods directly to the SchemaProvider trait:

  • udtf_names(), udtf(), register_udtf(), deregister_udtf(), udtf_exist()

Concern: As we add support for other UDF types (scalar UDFs, aggregate UDFs, window UDFs) to schemas, this approach will significantly expand the SchemaProvider interface, potentially making it noisy and harder to maintain.

Alternative approach: Introduce a UdfContainer or FunctionRegistry trait that manages all types of user-defined functions:

pub trait SchemaProvider {
    // ... existing methods ...

    /// Returns a function registry for managing UDFs in this schema
    fn function_registry(&self) -> Option<Arc<dyn FunctionRegistry>> {
        None
    }
}

pub trait FunctionRegistry {
    // Table functions
    fn udtf_names(&self) -> Vec<String>;
    fn udtf(&self, name: &str) -> Result<Option<Arc<TableFunction>>>;
    fn register_udtf(&self, name: String, function: Arc<TableFunction>) -> Result<Option<Arc<TableFunction>>>;

    // Future: scalar UDFs, aggregate UDFs, window UDFs
    // fn udf_names(&self) -> Vec<String>;
    // fn register_udf(&self, name: String, function: Arc<ScalarUDF>) -> Result<...>;
    // ...
}

Benefits:

  • Cleaner SchemaProvider interface
  • Extensible design for future UDF types
  • Single point of management for all schema-scoped functions

Tradeoffs:

  • Additional abstraction layer
  • More complex for simple use cases
  • Requires more boilerplate for implementations

Question: Should we refactor to use a FunctionRegistry pattern before merging, or is the current direct-method approach acceptable?

2. Information Schema Exposure

Should schema-scoped table functions be exposed via information_schema.routines or a new information_schema.table_functions table?

Arguments for:

  • Provides discoverability of available functions
  • Consistent with how tables are exposed via information_schema
  • Useful for tooling and introspection

Arguments against:

  • Additional implementation complexity
  • Uncertain user demand
  • Global table functions would not appear (they're not in schemas)

Current state: Not implemented in this PR.

Question: Is information_schema support desired for the initial implementation, or can it be added in a follow-up PR?


Feedback on these design decisions would be appreciated before finalizing this PR.

This commit adds the ability to register and use table functions (UDTFs)
at the schema level, in addition to the existing global registration.

Changes:
- Extended SchemaProvider trait with 5 new methods for table function
  management: udtf_names(), udtf(), register_udtf(), deregister_udtf(),
  and udtf_exist()
- Implemented table function storage in MemorySchemaProvider using DashMap
- Updated ContextProvider lookup logic to support qualified names
  (e.g., schema.function_name)
- Modified SQL parser to handle qualified table function names

Backward Compatibility:
- Unqualified names continue to resolve from global registry
- Qualified names look up functions in the specified schema only
- All new trait methods have default implementations
- Zero breaking changes to existing APIs

Testing:
- Added 5 unit tests for MemorySchemaProvider
- Added 3 Rust integration tests verifying qualified/unqualified behavior
- Added 5 SQL logic tests
- All 13 catalog tests passing
@github-actions github-actions bot added sql SQL Planner core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) catalog Related to the catalog crate labels Oct 12, 2025
@Omega359
Copy link
Contributor

FYI - the FunctionRegistry trait already exists

@bubulalabu
Copy link
Author

oh, amazing! thanks for pointing me to it, I'll have a thorough look at it soon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

catalog Related to the catalog crate core Core DataFusion crate sql SQL Planner sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support for adding table functions to schemas

2 participants