A comparison of the relative speedup achieved by branchless programming techniques in a variety of languages.
For each language and test pair, the ratio t_branchless/t_branched
is computed. Here t_branchless
is the CPU time taken by the branchless implementation, and t_branched
is the time taken by the branched implementation.
Numbers less than 1.0 imply branchless techniques result in a speedup, while numbers greater than one imply they result in a slowdown. For example, 0.23 implies the branchless version ran in 23% of the time the branched version took.
Results are organized by programming language, under which the relative speedup is given for each test (see more about the tests below). Based on the fluctuations observed, these results have a relative uncertainty of about 10% (rough estimate).
C
num_thresh
: 0.99
C++
num_thresh
: 1.01
Chapel
num_thresh
: 1.11
Fortran
num_thresh
: 1.01
Go
num_thresh
: 1.11
Java
num_thresh
: 0.48
JavaScript
num_thresh
: 1.03
Python
num_thresh
: 11.46
num_thresh
: given an array of random numbers, how many of them are above some threshold?
This project uses GNU Make to run the tests.
A Python script is used to collect all testing results and automatically update the README.md
.
Compiler optimizations are generally set to the maximum level, but look in the Makefile
for the exact flags.
The level of optimizations can change the results drastically. For example, without compiler optimizations the branchless to branched time ratio for num_thresh
is about 0.25 (compared to about 1.00 for -O3
optimization).
If you would like to contribute tests for an open source language or suggest new kinds of tests, feel free to open an issue. Instructions for adding a new language are in CONTRIBUTING.md
.
If you want to contribute and are looking for language suggestions please consider:
- Rust
- C#
- R
- Julia
- Pascal
- Lisp
- Perl