-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Description
Thanks to the discussions and fixes in #3640 and follow up work by @lesteve and @ogrisel we now have a build of OpenBLAS with emscripten for WebAssembly in Pyodide. It works quite well when used via scipy.
I recently run some benchmarks for square matrix multiplications (DGEMM) to get some ideas about the performance, which can be found here. The good news is that the scipy build with OpenBLAS is around 2-3x times faster for DGEMM than with the reference BLAS. The less good news is that it's still around 10x slower than almost the same OpenBLAS version built for a modern x86-64 CPU (single-threaded) natively.
For now, the constraint of that runtime is single-threaded, and without SIMD. (Though we should investigate whether it would be possible optionally built with SIMD and have some browser feature detection.)
I was wondering if is there anything else we could try to improve the performance of OpenBLAS for the WebAssembly platform ?
It's currently built with Emscripten using the following options,
make libs shared CC=emcc HOSTCC=gcc TARGET=RISCV64_GENERIC NOFORTRAN=1 NO_LAPACKE=1 \
USE_THREAD=0 -O2
Thank you!