[Minuit2] Cache transformed parameter values in MnHesse #17817
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
It was figured out with
perf
andflamegraph.pl
that the mainperformance bottleneck when using Minuits Hesse on RooFit likelihoods is
the transformation to internal Minuit parameters in MnHesse, which is a
relatively expensive trigonometric operation. It is done for all
parameters at every function operation, even if only one parameter is
changed. We need to do order n-squared function calls for the Hessian.
The total runtime of calling the RooFit function itself scales only
linearly with the number of parameters, thanks to the caching in RooFit.
However, the parameter transformation in MnHesse implements no caching
and therefore has quadratic cost.
This PR implements caching for the transformed parameters to make
this bottleneck go away completely.
With the changes in this PR, the plan-of-work item of "Speedup the
computation of the Hessian for big Higgs combinations at least by factor
of 2" is completed. Our ATLAS benchmark is now doing the Hesse step in
100 s instead of 120 s. But as the addressed bottleneck grows with the
number of fit parameters squared, this optimization will have a much
stronger impact on some reported user workflows, where computing the
Hessian takes hours right now. In any case, considering also the
performance improvements in other PRs in this development cycle, one
gets a 2 x speedup in our benchmark too (see #17816).
This change in Minuit indirectly affects all RooFit users. I saw that
this parameter transformation is also the main bottleneck in evaluating
Hessians with likelihoods from CMS combine.
ATLAS Higgs combination benchmark
With ROOT master
Total runtime of
minimize()
andhesse()
: 160 s.With this PR and #17816
Total runtime of
minimize()
andhesse()
: 105 s (34 % faster).The new bottlenecks are again in RooFit:
RooAbsArg::setValueDirty()
(about 10 s runtime, we can get rid of it easily because the new CPU evaluation backend doesn't use the dirty flag information anyway)RooFit::Evaluator::run()
that is not related to actual computation, again 10 more seconds. I don't know what to do about it.In particular, the
setValueDirty()
is responsible for most of the runtime in the line search. If we get rid of it, the line search will bottleneck fits with AD much less, where the gradient step is very fast and the line search is the bottleneck of the overall minimization.