minifusion

A tiny columnar analytical query engine written in Rust, built from scratch to learn how engines like Apache DataFusion work under the hood.

This is a learning project. The goal is not to compete with DataFusion, but to reproduce its key design decisions at small scale:

Apache Arrow as the internal columnar format.
A pull-based execution model built on async Streams — operators pull RecordBatches from their children on demand.
Separation between the logical plan and the physical plan.

Status: early stage / work in progress. The core execution abstraction, a CSV source, and a projection operator are implemented and tested. The remaining operators and the DataFrame / SQL frontends are on the roadmap below.

The central abstraction: `ExecutionPlan`

Every physical operator implements one trait and produces a stream of Arrow record batches:

#[async_trait]
pub trait ExecutionPlan: Send + Sync {
    fn schema(&self) -> SchemaRef;
    fn children(&self) -> Vec<Arc<dyn ExecutionPlan>>;
    fn execute(&self) -> Result<SendableRecordBatchStream>;
}

Operators compose by wrapping each other — Projection wraps CsvScan, a future Filter wraps Projection, and so on. It is the same idea as chained Iterators, but async and columnar. Calling execute() on the outermost plan lazily drives the whole pipeline.

What works today

CsvScan — reads a CSV file into batched Arrow RecordBatches, inferring the schema from the header and first rows.
ProjectionExec — selects a subset of columns by name, projecting both the schema and each batch.
MiniFusionError — a single typed error (thiserror) with Io, Arrow, Schema, and NotImplemented variants, plus a Result<T> alias.
An end-to-end integration test that scans a CSV fixture and asserts batching, schema, and row counts.

The main.rs CLI, the dataframe DSL builder, and the execution (SessionContext) module are scaffolded but not yet implemented.

Running the tests

cargo test

The integration test lives in tests/csv_scan.rs and uses the fixture at tests/fixtures/people.csv.

Architecture

src/
├── lib.rs              # public re-exports
├── main.rs             # CLI entry point (stub)
├── error.rs            # MiniFusionError + Result alias
├── datasource/         # data readers (CSV today, Parquet planned)
│   └── csv.rs          # CsvScan
├── physical_plan/      # ExecutionPlan trait + physical operators
│   └── projection.rs   # ProjectionExec
├── execution/          # SessionContext / runtime (planned)
└── dataframe.rs        # DataFrame DSL builder (planned)

Modules appear progressively, level by level, rather than all at once.

Roadmap

The project is organized into levels, each closing with an integration test.

Level 1 — Basics: CSV scan ✅, projection ✅, limit, DataFrame DSL, SessionContext, and a minifusion run CLI.
Level 2 — Filters: row filtering (=, !=, <, >, <=, >=) with a minimal Expr tree and vectorized evaluation over Arrow arrays.
Level 3 — Aggregations: COUNT, SUM, AVG, MIN, MAX and GROUP BY via hash aggregation with incremental accumulators.
Level 4 — Parquet: a Parquet data source, with projection pushdown to the reader as a bonus.
Level 5 — Logical / physical plans: introduce a LogicalPlan tree and a planner that lowers it to Arc<dyn ExecutionPlan>, plus a simple optimizer rule (projection pushdown) and an optional minimal SQL frontend.

Acknowledgements

Heavily inspired by Apache DataFusion and the broader Arrow ecosystem. Any good design idea here is theirs; any rough edge is mine.

License

MIT — see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
src		src
tests		tests
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

minifusion

The central abstraction: `ExecutionPlan`

What works today

Running the tests

Architecture

Roadmap

Acknowledgements

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

minifusion

The central abstraction: ExecutionPlan

What works today

Running the tests

Architecture

Roadmap

Acknowledgements

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

The central abstraction: `ExecutionPlan`

Packages