Benchmarks

This document summarizes the benchmarking approach for the MIND compiler and runtime. The goal is to track regressions and communicate baseline performance characteristics.

Methodology

Target Hardware – Default runs target x86-64 AVX2 hosts with 32 GB RAM; GPU runs target CUDA-capable cards when mlir-exec is enabled.
Datasets – Synthetic workloads (matrix multiplications, convolutions) plus representative ML kernels sourced from benchmarks/.
Execution Modes – Interpreter (cpu-exec), ahead-of-time MLIR (mlir-build), and JIT (mlir-exec).
Warmup & Repetitions – Each benchmark performs 3 warmup runs followed by 10 measured iterations; results report median and 95th percentile.

Performance Baselines

The following baselines were collected on reference hardware (Intel Core i7-5930K @ 3.50GHz, 64 GB DDR4, RTX 3080 10GB, Ubuntu 24.04 LTS) using Rust 1.84 stable.

Compiler Performance

Operation	Input Size	Time (median)	Memory
Parse	1K LOC	2.1 ms	12 MB
Parse	10K LOC	18 ms	45 MB
Type check	1K LOC	4.3 ms	18 MB
Type check	10K LOC	38 ms	85 MB
IR lower	1K LOC	1.8 ms	8 MB
IR lower	10K LOC	15 ms	42 MB
MLIR emit	1K ops	3.2 ms	15 MB
MLIR emit	10K ops	28 ms	95 MB

Shape Inference

Tensor Rank	Broadcast Dims	Time
2D	0	0.8 μs
2D	2	1.2 μs
4D	0	1.5 μs
4D	4	2.8 μs
8D	4	5.1 μs

Autodiff

Function Complexity	Forward Ops	Grad Gen Time
Simple (add/mul)	10	0.4 ms
Medium (matmul chain)	100	3.2 ms
Complex (conv + reduce)	1000	28 ms

Test Suite

Category	Test Count	Total Time
Unit tests	80	~0.2 s
Integration tests	89	~0.5 s
Full suite	169+	~1 s

Metrics

Metric	Description
Latency	Execution time per run (ms)
Throughput	Ops or samples per second
Memory usage	Peak RSS collected via `procfs` helpers
Compile time	IR → MLIR → executable duration

Results are exported as JSON into benchmarks/results/*.json and visualized with the CLI (mind bench report).

Regression Tracking

Continuous integration runs a smoke subset on every pull request. Nightly jobs execute the full suite and compare against the rolling baseline stored in benchmarks/baselines/.

When a regression exceeds thresholds:

CI marks the run unstable and attaches artifacts.
Engineers inspect IR/MLIR dumps to identify passes responsible for the change.
A follow-up issue documents the root cause and mitigation plan.

Future Work

GPU benchmark coverage for the runtime plugin API
Automated comparison against PyTorch/XLA baselines
Visualization dashboards for long-term trends

See the roadmap for scheduling details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmarks

Methodology

Performance Baselines

Compiler Performance

Shape Inference

Autodiff

Test Suite

Metrics

Regression Tracking

Future Work

FilesExpand file tree

benchmarks.md

Latest commit

History

benchmarks.md

File metadata and controls

Benchmarks

Methodology

Performance Baselines

Compiler Performance

Shape Inference

Autodiff

Test Suite

Metrics

Regression Tracking

Future Work