feat: `ducklake` destination #3015

zilto · 2025-08-19T20:10:22Z

Related Issues

issue: DuckLake support #2709

Questions / tasks

How can I retrieve the destination / dataset name to set the default duckdb file name (e.g., creating chess.duckdb from dlt.pipeline(..., destination="duckdb")
Currently, the output data files and duckdb catalog do not respect the test fixture that sets the storage path. This is likely related to a missing feature

netlify · 2025-08-19T20:11:50Z

✅ Deploy Preview for dlt-hub-docs canceled.

Name	Link
🔨 Latest commit	`a8817ae`
🔍 Latest deploy log	https://app.netlify.com/projects/dlt-hub-docs/deploys/68a4dea1fd729400086571ea

rudolfix

this looks really good! here's summary of my suggestion:

simplify ducklake credentials class (ie. remove __init__, implement _conn_str()
load extensions in borrow_conn
we'll need to tweak how connections are opened in ibis handover (but that's easy)

rudolfix · 2025-08-20T08:54:31Z

dlt/destinations/impl/duckdb/configuration.py

@@ -202,6 +204,7 @@ def is_partial(self) -> bool:
        return self.database == ":pipeline:"

    def on_resolved(self) -> None:
+        # TODO Why don't we support `:memory:` string?


we support it. you can pass duckdb instance instead of credentials and destination factory will use it:
https://dlthub.com/docs/dlt-ecosystem/destinations/duckdb#destination-configuration (those docs will benefit from better section titles)

:memory: database is wiped out when connection is closed. during the loading the connection will be opened and closed several times. ie. to migrate schemas. and at the end all the data will be lost because we close all connection when loader exits

rudolfix · 2025-08-20T09:03:08Z

dlt/destinations/impl/duckdb/configuration.py

@@ -19,6 +19,8 @@
 DUCK_DB_NAME_PAT = "%s.duckdb"


+# NOTE duckdb extensions are only loaded when using the dlt cursor. They are not
+# loaded when using the native connection (e.g., when passing it to Ibis)


there's a mechanism to load extensions at start. it could be made easier for implementers but right now you can update extensions in on_resolve of DuckLakeCredentials(DuckDbBaseCredentials) (that you implement below).

some docs: https://dlthub.com/docs/dlt-ecosystem/destinations/duckdb#additional-configuration

another option you have is to subclass sql_client. see the base class.

class DuckDbSqlClient(SqlClientBase[duckdb.DuckDBPyConnection], DBTransaction): dbapi: ClassVar[DBApi] = duckdb def __init__( self, dataset_name: str, staging_dataset_name: str, credentials: DuckDbBaseCredentials, capabilities: DestinationCapabilitiesContext, ) -> None: super().__init__(None, dataset_name, staging_dataset_name, capabilities) self._conn: duckdb.DuckDBPyConnection = None self.credentials = credentials # set additional connection options so derived class can change it # TODO: move that to methods that can be overridden, include local_config self._pragmas = ["enable_checkpoint_on_shutdown"] self._global_config: Dict[str, Any] = { "TimeZone": "UTC", "checkpoint_threshold": "1gb", } @raise_open_connection_error def open_connection(self) -> duckdb.DuckDBPyConnection: self._conn = self.credentials.borrow_conn( pragmas=self._pragmas, global_config=self._global_config, local_config={ "search_path": self.fully_qualified_dataset_name(), }, ) return self._conn

and inject extensions on init or when connection is being opened

rudolfix · 2025-08-20T09:03:46Z

dlt/destinations/impl/duckdb/sql_client.py

@@ -546,3 +558,39 @@ def __del__(self) -> None:
        if self.memory_db:
            self.memory_db.close()
            self.memory_db = None
+
+
+def _install_extension(duckdb_sql_client: DuckDbSqlClient, extension_name: LiteralString) -> None:


mhmmm I think the code that adds extensions in borrow_conn will suffice. if not we can move those utils there?

rudolfix · 2025-08-20T09:24:15Z

dlt/destinations/impl/ducklake/configuration.py

+class DuckLakeCredentials(DuckDbCredentials):
+    def __init__(
+        self,
+        # TODO how does duckdb resolve the name of the database to the name of the dataset / pipeline


here's something that I may not fully grasp. but DuckLakeCredentials will create :memory: instance

to which you attach catalog below

to which you attach storage

that gets configured with extensions and settings in DuckLakeCredentials (self)

and this instance DuckLakeCredentials is used to borrow_con

so what should assume dataset_name here? catalog database if it is dukcdb? pls see below

For the default case, here's what I'm currently aiming for:

pipeline = dlt.pipeline("jaffle_shop", destination="ducklake") pipeline.run(...)

a duckdb instance is created in :memory:; we call it the ducklake_client

the ducklake_client installs the ducklake extension for duckdb (needs to be done once per system)

the ducklake_client uses the ATTACH command to load a catalog and storage

the catalog is a duckdb instance on disk (with extension .ducklake instead of .duckdb by convention)

the default storage is completely handled by DuckDB / DuckLake

The outcome is

|- pipeline.py |- jaffle_shop.ducklake # catalog file (if duckdb or sqlite) |- jaffle_shop.ducklake.files/ # storage |- main/ # schema level |- customers/ # table level |- data.parquet # data |- orders/

Design

The DuckLakeCredentials inherits from DuckDbCredentials and the "main" credentials are used to define the ducklake_client

We always use an in-memory DuckDB connection for the ducklake_client

rudolfix · 2025-08-20T09:25:02Z

dlt/destinations/impl/ducklake/configuration.py

+        # TODO how does duckdb resolve the name of the database to the name of the dataset / pipeline
+        ducklake_name: str = "ducklake",
+        *,
+        catalog_database: Optional[Union[ConnectionStringCredentials, DuckDbCredentials]] = None,


postgres, mysql, duckdb, motherduck are all ConnectionStringCredentials so maybe that's enough to put here

you can use drivername to distinguish them

rudolfix · 2025-08-20T11:53:42Z

dlt/destinations/impl/ducklake/configuration.py

+    return caps
+
+
+# TODO support connecting to a snapshot


that would be amazing but we can do that later. snapshots mean reproducible local environments that you can get with 0 copy

rudolfix · 2025-08-20T11:54:56Z

dlt/destinations/impl/ducklake/configuration.py

+        attach_statement = f"ATTACH IF NOT EXISTS 'ducklake:{ducklake_name}.ducklake'"
+        if storage:
+            # TODO handle storage credentials by creating secrets
+            attach_statement += f" (DATA_PATH {storage.bucket_url})"


you should pass storage to create_secret before you attach (after you open the connection)

rudolfix · 2025-08-20T11:59:14Z

tests/load/ducklake/test_ducklake_client.py

+)
+
+
+def test_native_duckdb_workflow(tmp_path):


makes sense to do a few "smoke tests". the next step would be to enable ducklake to be tested for exactly the same tests as duckdb using ie. local duckdb as catalog and local filesystem as storage.

let's do another iteration of this ticket and then I'll look at this. I was able to do the same with iceberg destination so I'm pretty sure it will work

rudolfix · 2025-08-20T12:01:10Z

dlt/destinations/impl/ducklake/configuration.py

+
+
+# TODO add connection to a specific snapshot
+# TODO does it make sense for ducklake to have a staging destination?


good point see here: #1692

rudolfix · 2025-08-20T12:03:15Z

dlt/destinations/impl/ducklake/factory.py

+
+        return DuckLakeClient
+
+    def _raw_capabilities(self) -> DestinationCapabilitiesContext:


note: ducklake will support upsert (MERGE INTO) so we can enable this strategy to see if it works

zilto added 4 commits August 13, 2025 15:28

move duckdb capabilities to utility function

71d166e

add basic DuckLake files based on DuckDB / Motherduck

53973e9

refactor ducklake config

9020306

wip; ducklake destination

63508c2

simplified testing

a8817ae

rudolfix requested changes Aug 20, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: `ducklake` destination #3015

feat: `ducklake` destination #3015

Uh oh!

zilto commented Aug 19, 2025 •

edited

Loading

Uh oh!

netlify bot commented Aug 19, 2025 •

edited

Loading

Uh oh!

rudolfix left a comment

Uh oh!

rudolfix Aug 20, 2025

Uh oh!

rudolfix Aug 20, 2025

Uh oh!

rudolfix Aug 20, 2025

Uh oh!

rudolfix Aug 20, 2025

Uh oh!

zilto Aug 20, 2025

Uh oh!

rudolfix Aug 20, 2025

Uh oh!

rudolfix Aug 20, 2025

Uh oh!

rudolfix Aug 20, 2025

Uh oh!

rudolfix Aug 20, 2025

Uh oh!

rudolfix Aug 20, 2025

Uh oh!

rudolfix Aug 20, 2025

Uh oh!

rudolfix Aug 20, 2025

Uh oh!

Uh oh!



		# TODO add connection to a specific snapshot
		# TODO does it make sense for ducklake to have a staging destination?


		return DuckLakeClient

		def _raw_capabilities(self) -> DestinationCapabilitiesContext:

feat: ducklake destination #3015

Are you sure you want to change the base?

feat: ducklake destination #3015

Uh oh!

Conversation

zilto commented Aug 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Related Issues

Questions / tasks

Uh oh!

netlify bot commented Aug 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for dlt-hub-docs canceled.

Uh oh!

rudolfix left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Design

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

feat: `ducklake` destination #3015

feat: `ducklake` destination #3015

zilto commented Aug 19, 2025 •

edited

Loading

netlify bot commented Aug 19, 2025 •

edited

Loading