Skip to content

Commit 2e3d66a

Browse files
committed
New blog post by Galin Bistrev
1 parent a3c3050 commit 2e3d66a

File tree

1 file changed

+154
-0
lines changed

1 file changed

+154
-0
lines changed
Lines changed: 154 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,154 @@
1+
---
2+
title: "Results from CERN Summer school 2025: Supporting Automatic
3+
Differentiation in CMS Combine profile likelihood scans "
4+
layout: post
5+
excerpt: "A CERN Summer Student 2025 project aiming at the support of
6+
automatic differentiation (AD) for likelihood scans in the CMS Combine
7+
tool to accelerate statistical inference by leveraging RooFit's
8+
AD support and LLVM-based gradient generation."
9+
sitemap: false
10+
author: Galin Bistrev
11+
permalink: blogs/2025_galin_bistrev_results_blog/
12+
banner_image: /images/blog/banner-cern.jpg
13+
date: 2025-09-25
14+
tags: cern cms root combine c++ rooFit automatic-differentiation
15+
---
16+
17+
### **Introduction**
18+
Greetings! I’m Galin Bistrev, a fourth-year student specializing in Nuclear
19+
and Particle Physics at the University of Sofia "St. Kliment Ohridski".
20+
As part of the CERN Summer Student Programme 2025, I was working on a project
21+
that aimed to provide support for Automatic Differentiation (AD)
22+
into the CMS Combine tool profile likelihood scans.
23+
24+
25+
Mentors: Jonas Rembser , Vassil Vasilev , David Lange
26+
27+
### **Description of the Project**
28+
29+
This project aims to enhance support for Automatic Differentiation (AD)
30+
in likelihood scans within the CMS Combine framework, the primary
31+
statistical analysis tool of the CMS experiment at CERN.
32+
Combine is built on top of RooFit, which has recently introduced AD to
33+
improve minimization techniques.By providing computationally efficient
34+
gradients through AD, RooFit achieves substantial performance
35+
improvements. In Roofit ,Clad converts internal likelihood representations
36+
into standalone C++ code, from which gradient routines for AD
37+
are generated.This strategy not only speeds up the fitting process but
38+
also increases the portability and shareability of likelihood models,
39+
making them usable even by those without detailed knowledge
40+
of RooFit or Combine internals.
41+
42+
43+
44+
### **Brief overview of the CMS Combine engine**
45+
Combine is a statistical analysis framework that compares models
46+
of expected observations with real data. It is widely used for tasks
47+
such as searching for new particles or processes, setting limits on
48+
potential new physics, and measuring physical quantities like cross sections.
49+
Although developed with High Energy Physics (HEP) applications in mind,
50+
Combine contains no intrinsic physics assumptions, making it fully general
51+
and independent of any specific analysis. This flexibility allows it
52+
to be applied across a broad range of statistical problems.
53+
54+
Roughly, Combine performs three main functions:
55+
56+
- Builds a statistical model of expected observations.
57+
58+
- Runs statistical tests comparing the model with observed data.
59+
60+
- Provides tools for validating, inspecting, and understanding both the
61+
model and the results of the statistical tests.
62+
63+
### Project goals
64+
65+
In order for AD to be supported in Combine likelihood scans ,
66+
a number of goals needed to be achieved:
67+
68+
- Refactoring some of Combine's logic into RooFit , so that Combine can
69+
reuse the AD-enabled minimization algorithm already present there.
70+
71+
- Integrate gradient computation into likelihood scans, ensuring that
72+
derivatives are correctly propagated for efficient and accurate minimization.
73+
74+
- Validate correctness and performance, confirming that the AD-based scans
75+
produce results consistent with traditional
76+
methods while offering improved performance.
77+
78+
## **Overview of Completed Work**
79+
Over the course of the project, several major tasks were completed to
80+
achieve the stated objectives:
81+
82+
- Imported the `RooMultiPdf` class in RooFit from Combine, enabling
83+
switching between multiple PDF-s, applying statistical penalties, and
84+
supporting code generation for AD.
85+
86+
- The implementation of the new class was made to be supported by `codegen`
87+
in RooFit by adding a new function in `MathFunc.h` and extending
88+
`CodegenImpl.cxx` to generate code for models making use of it.
89+
90+
- Imported three pieces of code from Combine that handle the minimization
91+
procedures within the framework in Roofit's `RooMinimizer.cxx`.
92+
The first is the class `FreezeDisconnectedParametersRAII`,
93+
which automatically freezes and unfreezes parameters disconnected from
94+
the likelihood graph. The second is the function `generateOrthogonalCombinations`,
95+
which generates a list of index combinations by initializing a base configuration
96+
with all indices set to zero and then varying one category at a time.
97+
The third and final function is `reorderCombinations`, which takes the
98+
set of indices produced by `generateOrthogonalCombinations` and reorders
99+
them so that combinations differing least from the current best
100+
configuration are evaluated first.
101+
102+
- Using the above stated functions , the discrete profiling algorith,
103+
which is the main minimization algorithm in Combine, was imported in
104+
`RooMinimizer.cxx`.
105+
106+
- Created a [tutorial](https://root.cern/doc/master/rf619__discrete__profiling_8py.html)
107+
and a [benchmark](https://github.com/vgvassilev/clad/issues/1521),
108+
demonstrating discrete profiling with RooMultiPdf objects and evaluating
109+
the performance of AD in the likelihood scans.
110+
111+
## **Results**
112+
With those objectives accomplished, RooFit now provides AD support for discrete profiling.
113+
However, the developed benchmark indicates that AD does not currently
114+
improve efficiency, as the gradient code generated by Clad introduces overhead.
115+
Further optimization in Clad is needed to achieve the potential performance
116+
gains for RooFit likelihood scans.More information regarding the issue can be
117+
found at [#1521](https://github.com/vgvassilev/clad/issues/1521).
118+
119+
## **Conclusions**
120+
Thanks to this project, RooFit now enables AD support for discrete profiling in Combine,
121+
which, after addressing the current overhead in Clad, would allow for
122+
significantly faster and more efficient likelihood scans while maintaining
123+
accurate optimization of both discrete and continuous parameters.
124+
125+
## *Future work*
126+
- Further benchmarking is required to quantify the potential performance
127+
gains from automatic differentiation.
128+
129+
- Additional optimization of Clad is needed to eliminate unnecessary overhead
130+
in gradient generation.
131+
132+
- The discrete profiling logic implemented in RooMinimizer should be tested across
133+
different models to evaluate the minimizer’s behavior and robustness
134+
135+
## **Acknowledgements**
136+
would like to express my sincere gratitude to the CERN Summer School for
137+
the opportunity to participate in such an inspiring project. I extend
138+
special thanks to Jonas Rembser, Vassil Vassilev, and David Lange for
139+
their invaluable guidance and for providing continuous
140+
learning opportunities throughout this journey. I am also grateful to
141+
the ROOT team for welcoming me and supporting me throughout my stay at
142+
CERN.
143+
144+
145+
146+
147+
## **Related Links**
148+
### Related Links
149+
- [CMS Combine GitHub page]https://cms-analysis.github.io/HiggsAnalysis-CombinedLimit/latest/
150+
- [ROOT official repository]https://github.com/root-project/root
151+
- [My GitHub profile]https://github.com/GalinBistrev2
152+
153+
154+

0 commit comments

Comments
 (0)