Problem
Most of the existing works on client-side proving acceleration with GPUs maintain their own wheels. They build from scratch (BigInt, FF arithmetic) to tower (MSM, NTT, STARKs).
However, this fragmented codebase leads to high discovery costs, and it's hard to follow up on the SoTA implementations.
Details
Explore the best practices for crypto libraries on Metal so that they could serve as the bedrock for state-of-the-art NTT, MSM, and acceleration of client-side proving. Some directions could include:
- SIMD optimizations
- Features of the latest Metal APIs (Metal 4)
- Optimized Barrett Reduction
- Optimized Montgomery Multiplication
- Batch Inversions
- Sparse Matrix-Vector Ops