OpenMP overhead

I was profiling a colleague's internal `Rcpp`-powered package recently (see [`gperftools`](https://github.com/gperftools/gperftools) and [`pprof`](https://github.com/google/pprof)) and `omp_get_num_procs()` showed up as a major bottleneck (50% of the runtime). And apparently, the `omp_get_num_procs()` function itself was not the problem. It was apparently masking the general OpenMP overhead from frequently jumping in and out of a parallel state. I suspect `fbseqOpenMP` suffers from the same bottleneck, which could be why we did not really see a speed gain relative to simultaneous single-threaded chains. To work around this without CUDA, it might be necessary to micromanage a pool of persistent POSIX threads.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

OpenMP overhead #4

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

OpenMP overhead #4

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions