Popcorn - A collection of kernels & operations

GPU kernel implementations (+ assembled torch operations) in CUDA & Triton. This is a learning project exploring different optimization techniques for common operations.

Repository Structure

popcorn/
├── cuda/              # CUDA kernels
├── tl/                # Triton kernels
├── torch_op/          # PyTorch implementations
└── validation/        # Kernel correctness validation scripts

What's Implemented

CUDA kernels (in cuda/kernels/):

Vector addition
Matrix multiplication (+ SGEMM)
1D convolution
2D convolution
Sum reduction
Softmax
Fused QKV Projection
RoPE

Each operation has multiple implementations demonstrating different optimization techniques: naive implementations, shared memory usage, memory coalescing, warp-level primitives, cooperative groups, etc.

Triton kernels (in tl/kernels/):

Vector addition
Softmax
Layer normalization
Matrix multiplication

PyTorch implementations (in torch_op/):

Conv1d
Conv2d
Self-Attention
Layer Normalization
RMS Normalization
RoPE

`cuda`

See cuda/README.md for detailed instructions on building and running CUDA benchmarks.

Quick start:

cd cuda
make                                    # compile all benchmarks
./benchmarks/bench_matmul 2 1024        # run tiled matmul on 1024x1024 matrix
./benchmarks/bench_reduction 7 1048576  # run cooperative groups reduction

`tl`

See tl/README.md for detailed instructions on building and running Triton benchmarks.

Quick start:

cd tl
python -m benchmarks.bench_softmax

`torch_op`

To run tests:

cd torch_op
python -m pytest __tests__/test_rope.py   # run RoPE tests

Goals

Learn GPU programming and optimization techniques
Compare custom implementations against optimized libraries (cuBLAS, cuDNN)
Implement same operations in different frameworks (CUDA, Triton, PyTorch)
Document performance characteristics and optimization strategies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Popcorn - A collection of kernels & operations

Repository Structure

What's Implemented

`cuda`

`tl`

`torch_op`

Goals

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 69 Commits
cuda		cuda
tl		tl
torch_op		torch_op
validation		validation
.gitignore		.gitignore
README.md		README.md

ratcht/popcorn

Folders and files

Latest commit

History

Repository files navigation

Popcorn - A collection of kernels & operations

Repository Structure

What's Implemented

cuda

tl

torch_op

Goals

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

`cuda`

`tl`

`torch_op`

Packages