dlt-rust

Ingest data from an API with Data Load Tool (DLT) via a rust pyo3 plugin.

1 Ingestion

Data ingestion the component of data engineering which involves receiving data from an outside source, and loading the data within one's own environment.

Common use cases for ingestion, in my experience in an enterprise setting, are threefold:

Ingestion of data from a data suppplier outside one's own organisation.
Ingestion of data from an upstream team or environment in the data lifecycle.
Migration of data between platforms or, less commonly, environments (dev/prod).

2 Sources of Complexity

This definition of ingestion allows us to consider three components of complexity in ingestion:

2.1 Complexity of sources:

One's environment (ingestion destination) is likely to be significantly different from diverse ingestion sources.
Upstream data could be in diverse forms, including but not limited to: APIs, diverse flat files (excel, parquet, .wav), databases, and message streams.
These data sources can all exist with complex varieties of latency and schemas.

2.2 Complexity of destinations

One's own file destination should be more consistent; in a data engineering team it is industry best practice to store data in an open table format (delta/iceberg) in cloud file storage.
Data catalogs, which are effectively the previous pattern with more built-in metadata capabilities, are becoming more common but not universal.
Nonetheless, different teams can work with:

Different clouds
Different networking security
Different data models.
Different standards of code, or inherited legacy cloud/code components.

Through diversity of systems, there remains complexity.

2.3 Complexity of teams

Technical components are important, but team structures are often the most important component of complexity. In any organisation of reasonable size and geographic dispersal, ingestion between sources and destinations remains increasingly complex. Team/communication interfaces that result in complexity, for ingestion, includes:

Communication surrounding source/destination authentication.
Communication and dependencies on source availability.
Complexity of verifying source data quality and communication for resolution.

---

Data ingestion can sound simple: move data from one place to another. However, the above components result in complex patterns; include, with this, a high number of diverse data sources, and data ingestion becomes a hard problem in need of common patterns for simplification.

3 Routes to Simplicity

Build it

"Frameworks"

YAML Engineering:
No Code:

Buy it

Fortran
Matillion
Databricks(?)

Requirements

Ease of use:
Flexibility:
High in-built feature support:
Plugins/extensibility:
High performance:
Low/no cost:
In-built metadata/modelling:
(Nominal) DQ checks:

DLT

Delta Load Tool (DLT)

DLT has great potential beyond simple ingestion. Within a data platform's total cost of ownership, storage is often the most cost effective.

How DLT uses arrow.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
chess_vanilla		chess_vanilla
src		src
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
dev-requirements.txt		dev-requirements.txt
justfile		justfile
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

dlt-rust

1 Ingestion

2 Sources of Complexity

2.1 Complexity of sources:

2.2 Complexity of destinations

2.3 Complexity of teams

3 Routes to Simplicity

Build it

Buy it

Requirements

DLT

About

Releases

Packages

Languages

License

TomBurdge/dlt-rust

Folders and files

Latest commit

History

Repository files navigation

dlt-rust

1 Ingestion

2 Sources of Complexity

2.1 Complexity of sources:

2.2 Complexity of destinations

2.3 Complexity of teams

3 Routes to Simplicity

Build it

Buy it

Requirements

DLT

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages