Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Minuit2] Cache transformed parameter values in MnHesse #17817

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

guitargeek
Copy link
Contributor

@guitargeek guitargeek commented Feb 25, 2025

It was figured out with perf and flamegraph.pl that the main
performance bottleneck when using Minuits Hesse on RooFit likelihoods is
the transformation to internal Minuit parameters in MnHesse, which is a
relatively expensive trigonometric operation. It is done for all
parameters at every function operation, even if only one parameter is
changed. We need to do order n-squared function calls for the Hessian.
The total runtime of calling the RooFit function itself scales only
linearly with the number of parameters, thanks to the caching in RooFit.
However, the parameter transformation in MnHesse implements no caching
and therefore has quadratic cost.

This PR implements caching for the transformed parameters to make
this bottleneck go away completely.

With the changes in this PR, the plan-of-work item of "Speedup the
computation of the Hessian for big Higgs combinations at least by factor
of 2" is completed. Our ATLAS benchmark is now doing the Hesse step in
100 s instead of 120 s. But as the addressed bottleneck grows with the
number of fit parameters squared, this optimization will have a much
stronger impact on some reported user workflows, where computing the
Hessian takes hours right now. In any case, considering also the
performance improvements in other PRs in this development cycle, one
gets a 2 x speedup in our benchmark too (see #17816).

This change in Minuit indirectly affects all RooFit users. I saw that
this parameter transformation is also the main bottleneck in evaluating
Hessians with likelihoods from CMS combine.

ATLAS Higgs combination benchmark

With ROOT master

Total runtime of minimize() and hesse(): 160 s.

atlas_old

With this PR and #17816

Total runtime of minimize() and hesse(): 105 s (34 % faster).

atlas_new

The new bottlenecks are again in RooFit:

  • dirty flag propagation with RooAbsArg::setValueDirty() (about 10 s runtime, we can get rid of it easily because the new CPU evaluation backend doesn't use the dirty flag information anyway)
  • Other overhead in RooFit::Evaluator::run() that is not related to actual computation, again 10 more seconds. I don't know what to do about it.

In particular, the setValueDirty() is responsible for most of the runtime in the line search. If we get rid of it, the line search will bottleneck fits with AD much less, where the gradient step is very fast and the line search is the bottleneck of the overall minimization.

Copy link

github-actions bot commented Feb 25, 2025

Test Results

    18 files      18 suites   4d 12h 47m 45s ⏱️
 2 719 tests  2 719 ✅ 0 💤 0 ❌
47 253 runs  47 253 ✅ 0 💤 0 ❌

Results for commit 813e363.

♻️ This comment has been updated with latest results.

Some functions that were already mentioned as "internal" in the docs are
removed from the public interface.
It was figured out with `perf` and `flamegraph.pl` that the main
performance bottleneck when using Minuits Hesse on RooFit likelihoods is
the transformation to internal Minuit parameters in MnHesse, which is a
relatively expensive trigonometric operation. It is done for all
parameters at every function operation, even if only one parameter is
changed. We need to do order n-squared function calls for the Hessian.
The total runtime of calling the RooFit function itself scales only
linearly with the number of parameters, thanks to the caching in RooFit.
However, the parameter transformation in MnHesse implements no caching
and therefore has quadratic cost.

This commit implements caching for the transformed parameters to make
this bottleneck go away completely.

With the changes in this commit, the plan-of-work item of "Speedup the
computation of the Hessian for big Higgs combinations at least by factor
of 2" is completed. Our ATLAS benchmark is now doing the Hesse step in
100 s instead of 120 s. But as the addressed bottleneck grows with the
number of fit parameters squared, this optimization will have a much
stronger impact on some reported user workflows, where computing the
Hessian takes hours right now. In any case, considering also the
performance improvements in other commits in this development cycle, one
gets a 2 x speedup in our benchmark too.

This change in Minuit indirectly affects all RooFit users. I saw that
this parameter transformation is also the main bottleneck in evaluating
Hessians with likelihoods from CMS combine.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant