-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Description
I was profiling a colleague's internal Rcpp-powered package recently (see gperftools and pprof) and omp_get_num_procs() showed up as a major bottleneck (50% of the runtime). And apparently, the omp_get_num_procs() function itself was not the problem. It was apparently masking the general OpenMP overhead from frequently jumping in and out of a parallel state. I suspect fbseqOpenMP suffers from the same bottleneck, which could be why we did not really see a speed gain relative to simultaneous single-threaded chains. To work around this without CUDA, it might be necessary to micromanage a pool of persistent POSIX threads.
Metadata
Metadata
Assignees
Labels
No labels