Skip to content

Improving performance with WebAssembly #4023

@rth

Description

@rth

Thanks to the discussions and fixes in #3640 and follow up work by @lesteve and @ogrisel we now have a build of OpenBLAS with emscripten for WebAssembly in Pyodide. It works quite well when used via scipy.

I recently run some benchmarks for square matrix multiplications (DGEMM) to get some ideas about the performance, which can be found here. The good news is that the scipy build with OpenBLAS is around 2-3x times faster for DGEMM than with the reference BLAS. The less good news is that it's still around 10x slower than almost the same OpenBLAS version built for a modern x86-64 CPU (single-threaded) natively.

For now, the constraint of that runtime is single-threaded, and without SIMD. (Though we should investigate whether it would be possible optionally built with SIMD and have some browser feature detection.)

I was wondering if is there anything else we could try to improve the performance of OpenBLAS for the WebAssembly platform ?

It's currently built with Emscripten using the following options,

make libs shared CC=emcc HOSTCC=gcc TARGET=RISCV64_GENERIC NOFORTRAN=1 NO_LAPACKE=1 \
        USE_THREAD=0 -O2

Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions