sparse_convolution

Sparse 2D convolution in Python via Toeplitz matrix methods.

Fast when the kernel is small, the input is sparse, and/or many arrays share the same kernel.

Install

pip install sparse_convolution

Or from source:

git clone https://github.com/RichieHakim/sparse_convolution
cd sparse_convolution
pip install -e .

Usage

Single image

import numpy as np
import scipy.sparse
import sparse_convolution as sc

x = scipy.sparse.random(100, 100, density=0.01)
k = np.random.rand(5, 5)

conv = sc.Toeplitz_convolution2d(x_shape=x.shape, k=k, mode='same')
out = conv(x=x, batching=False).toarray()

Batched

Input: (n_images, H * W) sparse matrix. Output: (n_images, H_out * W_out).

x_batch = scipy.sparse.vstack([
    scipy.sparse.random(100, 100, density=0.01).reshape(1, -1)
    for _ in range(50)
]).tocsr()

conv = sc.Toeplitz_convolution2d(x_shape=(100, 100), k=k, mode='same')
out = conv(x=x_batch, batching=True)

Methods and backends

Four methods, each with selectable backends:

Method	numpy	numba	torch
`direct`	n/a	yes	n/a
`precomputed`	yes	yes	yes
`lazy`	yes	n/a	yes
`gather_scatter`	yes	yes	yes

direct: Batch-parallel scatter convolution with thread-local dense buffers (numba only). For each image in parallel, scatters kernel-weighted input values into an L2-cache-sized accumulator buffer, then extracts nonzeros into CSR format. Uses a two-phase approach: a lightweight boolean counting pass (1-byte flags, no float arithmetic) determines exact output sizes, then the scatter pass writes directly to right-sized arrays with zero waste. Interior pixels (~92-100%) skip bounds checking entirely via precomputed safe regions. O(nnz × K) per image with no init overhead. Fastest method across nearly all configurations. Requires numba.
precomputed: Builds a sparse Toeplitz matrix at init; fast batched matmul. Best for large batches with the same kernel when numba is not available.
lazy: COO broadcasting, no init cost. Best for very sparse inputs with small batches.
gather_scatter: Per-kernel-position scatter into a dense accumulator. General-purpose method for sparse batched inputs. Uses numba automatically when available, and falls back to numpy otherwise.

Backend selection:

numpy: scipy/numpy ops. Always available.
numba: JIT-compiled parallel loops. Fastest on CPU for batched inputs. Requires numba.
torch: PyTorch ops with optional GPU. Requires torch.

conv = sc.Toeplitz_convolution2d(
    x_shape=(100, 100),
    k=k,
    mode='same',
    method='direct',  # default
    backend=None,     # numba for the default direct method
)

If backend=None (default), direct uses numba. For environments without numba, choose a numpy-capable method explicitly, such as method='gather_scatter', backend='numpy' or method='precomputed', backend='numpy'.

References

Toeplitz convolution: stackoverflow.com/a/51865516, alisaaalehi/convolution_as_multiplication
1D convolution matrix: scipy.linalg.convolution_matrix

Benchmarks

All benchmarks run on CPU with 1s minimum measurement time per configuration (median reported). Nine method+backend combinations compared across six scaling sweeps.

Scaling overview

Six scaling sweeps varying batch size, density, image size, and kernel size. direct+numba (brown stars) is the fastest method in nearly all regimes.

Grid search: fastest method per configuration

Each cell shows the winning method and total time (init + call) for that batch size × density combination. direct+numba wins 28 of 36 configurations with an average 4.75× speedup over the second-fastest method.

Individual scaling curves

Batch size scaling — 100×100, 5×5, density=0.01

Density scaling — 100×100, 5×5, batch=100

Image size scaling — 5×5, density=0.01, batch=50

Kernel size scaling — 100×100, density=0.01, batch=50

Batch scaling — high density (0.1)

Batch scaling — very sparse (density=0.001, 200×200)

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
.github/workflows		.github/workflows
benchmarks		benchmarks
sparse_convolution		sparse_convolution
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

sparse_convolution

Install

Usage

Single image

Batched

Methods and backends

References

Benchmarks

Scaling overview

Grid search: fastest method per configuration

Individual scaling curves

About

Uh oh!

Releases 4

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

sparse_convolution

Install

Usage

Single image

Batched

Methods and backends

References

Benchmarks

Scaling overview

Grid search: fastest method per configuration

Individual scaling curves

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages