Skip to content

feat: add user defined table function support #1113

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open
4 changes: 2 additions & 2 deletions .github/workflows/test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -91,9 +91,9 @@ jobs:

- name: FFI unit tests
run: |
cd examples/ffi-table-provider
cd examples/datafusion-ffi-example
uv run --no-project maturin develop --uv
uv run --no-project pytest python/tests/_test_table_provider.py
uv run --no-project pytest python/tests/_test*.py

- name: Cache the generated dataset
id: cache-tpch-dataset
Expand Down
32 changes: 32 additions & 0 deletions docs/source/user-guide/common-operations/udf-and-udfa.rst
Original file line number Diff line number Diff line change
Expand Up @@ -242,3 +242,35 @@ determine which evaluate functions are called.
})

df.select("a", exp_smooth(col("a")).alias("smooth_a")).show()

Table Functions
---------------

User Defined Table Functions are slightly different than the other functions
described here. These functions take any number of `Expr` arguments, but only
literal expressions are supported. Table functions must return a Table
Provider as described in the ref:`_io_custom_table_provider` page.

Once you have a table function, you can register it with the session context
by using :py:func:`datafusion.context.SessionContext.register_udtf`.

There are examples of both rust backed and python based table functions in the
examples folder of the repository. If you have a rust backed table function
that you wish to expose via PyO3, you need to expose it as a ``PyCapsule``.

.. code-block:: rust

#[pymethods]
impl MyTableFunction {
fn __datafusion_table_function__<'py>(
&self,
py: Python<'py>,
) -> PyResult<Bound<'py, PyCapsule>> {
let name = cr"datafusion_table_function".into();

let func = self.clone();
let provider = FFI_TableFunction::new(Arc::new(func), None);

PyCapsule::new(py, provider, Some(name))
}
}
Loading