CMS-HIG-18-027 ; CERN-EP-2019-261 | ||
A deep neural network for simultaneous estimation of b jet energy and resolution | ||
CMS Collaboration | ||
12 December 2019 | ||
Computing and Software for Big Science 4 (2020) 10 | ||
Abstract: We describe a method to obtain point and dispersion estimates for the energies of jets arising from b quarks produced in proton-proton collisions at an energy of $\sqrt{s} = $ 13 TeV at the CERN LHC. The algorithm is trained on a large simulated sample of b jets and validated on data recorded by the CMS detector in 2017 corresponding to an integrated luminosity of 41 fb$^{-1}$. A multivariate regression algorithm based on a deep feed-forward neural network employs jet composition and shape information, and the properties of reconstructed secondary vertices associated with the jet. The results of the algorithm are used to improve the sensitivity of analyses that make use of b jets in the final state, such as the observation of Higgs boson decay to $\mathrm{b\bar{b}}$. | ||
Links: e-print arXiv:1912.06046 [hep-ex] (PDF) ; CDS record ; inSPIRE record ; CADI line (restricted) ; |
Figures | |
png pdf |
Figure 1:
(left) The $ {{p_{\mathrm {T}}} ^\text {reco}}$ distribution for reconstructed b jets in an MC ${\mathrm{t} {}\mathrm{\bar{t}}}$ sample. (right) Distribution of the regression target for the MC ${\mathrm{t} {}\mathrm{\bar{t}}}$ training sample. |
png pdf |
Figure 1-a:
The $ {{p_{\mathrm {T}}} ^\text {reco}}$ distribution for reconstructed b jets in an MC ${\mathrm{t} {}\mathrm{\bar{t}}}$ sample. |
png pdf |
Figure 1-b:
Distribution of the regression target for the MC ${\mathrm{t} {}\mathrm{\bar{t}}}$ training sample. |
png pdf |
Figure 2:
The 25, 40, 50, and 75% quantiles are shown for the b jet energy scale $ {{p_{\mathrm {T}}} ^{\text {gen}}}/ {{p_{\mathrm {T}}} ^\text {reco}}$ distribution before (blue dashdot) and after (red solid) applying the regression correction as a function of jet ${p_{\mathrm {T}}}$ (left), $\eta $ (center), and $\rho $ (right). |
png pdf |
Figure 2-a:
The 25, 40, 50, and 75% quantiles are shown for the b jet energy scale $ {{p_{\mathrm {T}}} ^{\text {gen}}}/ {{p_{\mathrm {T}}} ^\text {reco}}$ distribution before (blue dashdot) and after (red solid) applying the regression correction as a function of jet ${p_{\mathrm {T}}}$. |
png pdf |
Figure 2-b:
The 25, 40, 50, and 75% quantiles are shown for the b jet energy scale $ {{p_{\mathrm {T}}} ^{\text {gen}}}/ {{p_{\mathrm {T}}} ^\text {reco}}$ distribution before (blue dashdot) and after (red solid) applying the regression correction as a function of $\eta $. |
png pdf |
Figure 2-c:
The 25, 40, 50, and 75% quantiles are shown for the b jet energy scale $ {{p_{\mathrm {T}}} ^{\text {gen}}}/ {{p_{\mathrm {T}}} ^\text {reco}}$ distribution before (blue dashdot) and after (red solid) applying the regression correction as a function of jet $\rho $. |
png pdf |
Figure 3:
Relative jet energy resolution, ${\overline {\mathrm {s}}}$, as a function of generator-level jet $ {{p_{\mathrm {T}}} ^{\text {gen}}}$ (left), $\eta $ (center), and $\rho $ (right) for b jets from ${\mathrm{t} {}\mathrm{\bar{t}}}$ MC events. The average ${p_{\mathrm {T}}}$ of these b jets is 80 GeV. The blue stars and red squares represent ${\overline {\mathrm {s}}}$ before and after the DNN correction, respectively. The relative difference $\Delta {\overline {\mathrm {s}}} / {\overline {\mathrm {s}}} _{\text {baseline}}$ between the ${\overline {\mathrm {s}}}$ values before and after DNN corrections is shown in the lower panels. |
png pdf |
Figure 3-a:
Relative jet energy resolution, ${\overline {\mathrm {s}}}$, |
png pdf |
Figure 3-b:
Relative jet energy resolution, ${\overline {\mathrm {s}}}$, |
png pdf |
Figure 3-c:
Relative jet energy resolution, ${\overline {\mathrm {s}}}$, |
png pdf |
Figure 4:
Correlation between jet energy resolution $\mathrm {s}$ and the average jet energy resolution estimator $< \hat{\mathrm {s}}> $ for b jets from ${\mathrm{t} {}\mathrm{\bar{t}}}$ MC events. The blue circles correspond to the inclusive ${p_{\mathrm {T}}}$ spectrum, while the blue band represents 20% up and down variations of the fitted $< \hat{\mathrm {s}}> $ trend for the inclusive ${p_{\mathrm {T}}}$ spectrum. The red stars correspond to jets with ${p_{\mathrm {T}}}$ $\in $ [30, 50] GeV, orange diamonds to ${p_{\mathrm {T}}}$ $\in $ [50, 70] GeV, and green crosses to ${p_{\mathrm {T}}}$ $\in $ [110,120] GeV. |
png pdf |
Figure 5:
Dijet invariant mass distributions for simulated samples of ${\mathrm{Z} (\to \ell ^+\ell ^-)\mathrm{H} (\to b \mathrm{\bar{b}})}$ events, where two jets and two leptons were selected. Distributions are shown before (dotted blue) and after (solid red) applying the b jet energy corrections. A Bukin function [40] was used to fit the distribution. The fitted mean and width of the core of each distribution are displayed in the figure. |
png pdf |
Figure 6:
Distribution of the ratio between the transverse momentum of the leading b-tagged jet and that of the dilepton system from the decay of the Z boson. Distributions are shown before (left) and after (right) applying the b jet energy corrections. The ${\overline {\mathrm {s}}}$ values of the core distributions are included in the figures. The black points and histogram show the distributions for data and simulated events, respectively. |
png pdf |
Figure 6-a:
Distribution of the ratio between the transverse momentum of the leading b-tagged jet and that of the dilepton system from the decay of the Z boson. Distributions are shown before applying the b jet energy corrections. The ${\overline {\mathrm {s}}}$ values of the core distributions are included in the figures. The black points and histogram show the distributions for data and simulated events, respectively. |
png pdf |
Figure 6-b:
Distribution of the ratio between the transverse momentum of the leading b-tagged jet and that of the dilepton system from the decay of the Z boson. Distributions are shown after applying the b jet energy corrections. The ${\overline {\mathrm {s}}}$ values of the core distributions are included in the figures. The black points and histogram show the distributions for data and simulated events, respectively. |
Tables | |
png pdf |
Table 1:
Relative differences $\Delta {\overline {\mathrm {s}}} / {\overline {\mathrm {s}}} _\text {baseline}$ between the ${\overline {\mathrm {s}}}$ values obtained before and after applying the DNN energy correction for b jets produced in the different physics processes indicated. |
Summary |
We have described an algorithm that makes it possible to obtain point and dispersion estimates of the energy of jets arising from b quarks in proton-proton collisions. We trained a deep, feed-forward neural network, with inputs based on jet composition and shape information, and on properties of the associated reconstructed secondary vertex for a sample of simulated b jets arising from the decays of top quark-antiquark pairs. The neural network simultaneously finds robust mean, 25 and 75% quantile estimators for the energy of a b jet. The mean estimator is based on the Huber loss function and is used as an energy correction, while the 25 and 75% quantile estimators are used to build a jet-by-jet resolution estimator, defined as half the difference between these quantiles. The DNN-based algorithm leverages the information contained in a large training data set consisting of nearly 100 million simulated b jets, and improves the resolution of the b jet energy by 12-15% relative to that which is found after baseline corrections. An improvement of about 20% is observed in the resolution of the invariant mass of b jet pairs resulting from the decay of a Higgs boson produced in association with a Z boson. Events containing a dilepton decay of a Z boson produced in association with a b jet are used to validate the performance of the algorithm on proton-proton collision data recorded with the CMS detector. The jet energy resolution improvement observed in data is consistent with that found in simulation. The resolution estimator is further shown to predict the resolution of b jets with an accuracy of 20% over a ${p_{\mathrm{T}}}$ range between 30 and 350 GeV. The results described here are being used by the CMS Collaboration in several physics analyses targeting final states containing b jets, including the observation of the Higgs boson decay to $\mathrm{b\bar{b}}$ [13]. |
References | ||||
1 | ATLAS Collaboration | Observation of a new particle in the search for the standard model Higgs boson with the ATLAS detector at the LHC | PLB 716 (2012) 1 | 1207.7214 |
2 | CMS Collaboration | Observation of a new boson at a mass of 125 GeV with the CMS experiment at the LHC | PLB 716 (2012) 30 | CMS-HIG-12-028 1207.7235 |
3 | CMS Collaboration | A new boson with a mass of 125 GeV observed with the CMS experiment at the Large Hadron Collider | Science 338 (2012) 1569 | |
4 | ATLAS Collaboration | Measurements of Higgs boson production and couplings in the four-lepton channel in pp collisions at center-of-mass energies of 7 and 8 TeV with the ATLAS detector | PRD 91 (2015) 012006 | 1408.5191 |
5 | ATLAS Collaboration | Observation and measurement of Higgs boson decays to WW$ ^* $ with the ATLAS detector | PRD 92 (2015) 012006 | 1412.2641 |
6 | ATLAS Collaboration | Measurement of Higgs boson production in the diphoton decay channel in pp collisions at center-of-mass energies of 7 and 8 TeV with the ATLAS detector | PRD 90 (2014) 112015 | 1408.7084 |
7 | CMS Collaboration | Measurement of the properties of a Higgs boson in the four-lepton final state | PRD 89 (2014) 092007 | CMS-HIG-13-002 1312.5353 |
8 | CMS Collaboration | Measurement of Higgs boson production and properties in the WW decay channel with leptonic final states | JHEP 01 (2014) 096 | CMS-HIG-13-023 1312.1129 |
9 | CMS Collaboration | Observation of the diphoton decay of the Higgs boson and measurement of its properties | EPJC 74 (2014) 3076 | CMS-HIG-13-001 1407.0558 |
10 | CMS Collaboration | Observation of the Higgs boson decay to a pair of $ \tau $ leptons with the CMS detector | PLB 779 (2018) 283 | CMS-HIG-16-043 1708.00373 |
11 | ATLAS Collaboration | Observation of Higgs boson production in association with a top quark pair at the LHC with the ATLAS detector | PLB 784 (2018) 173 | 1806.00425 |
12 | CMS Collaboration | Observation of $ \mathrm{t\overline{t}} $H production | PRL 120 (2018) 231801 | CMS-HIG-17-035 1804.02610 |
13 | CMS Collaboration | Observation of Higgs boson decay to bottom quarks | PRL 121 (2018) 121801 | CMS-HIG-18-016 1808.08242 |
14 | ATLAS Collaboration | Observation of $ \mathrm{H \rightarrow \mathrm{b\bar{b}}} $ decays and VH production with the ATLAS detector | PLB 786 (2018) 59 | 1808.08238 |
15 | CDF Collaboration | Search for the standard model Higgs boson decaying to a $ \mathrm{b\bar{b}} $ pair in events with one charged lepton and large missing transverse energy using the full CDF data set | PRL 109 (2012) 111804 | 1207.1703 |
16 | CMS Collaboration | Search for the standard model Higgs boson produced through vector boson fusion and decaying to $ \mathrm{b\bar{b}} $ | PRD 92 (2015) 032008 | CMS-HIG-14-004 1506.01010 |
17 | P. J. Huber | Robust estimation of a location parameter | Ann. Math. Statist. 35 (1994) 731 | |
18 | R. W. Koenker and G. Bassett | Regression quantiles | Econometrica 46 (1978) 33 | |
19 | CMS Collaboration | The CMS experiment at the CERN LHC | JINST 3 (2008) S08004 | CMS-00-001 |
20 | CMS Collaboration | Particle-flow reconstruction and global event description with the CMS detector | JINST 12 (2017) P10003 | CMS-PRF-14-001 1706.04965 |
21 | M. Cacciari, G. P. Salam, and G. Soyez | The anti-$ {k_{\mathrm{T}}} $ jet clustering algorithm | JHEP 04 (2008) 063 | 0802.1189 |
22 | M. Cacciari, G. P. Salam, and G. Soyez | FastJet user Manual | EPJC 72 (2012) 1896 | 1111.6097 |
23 | CMS Collaboration | Jet energy scale and resolution in the CMS experiment in pp collisions at 8 TeV | JINST 12 (2017) P02014 | CMS-JME-13-004 1607.03663 |
24 | CMS Collaboration | Determination of jet energy calibration and transverse momentum resolution in CMS | JINST 6 (2011) P11002 | CMS-JME-10-011 1107.4277 |
25 | J. M. Campbell, R. K. Ellis, P. Nason, and E. Re | Top-Pair production and decay at NLO matched with parton showers | JHEP 04 (2015) 114 | 1412.1828 |
26 | J. Alwall et al. | The automated computation of tree-level and next-to-leading order differential cross sections, and their matching to parton shower simulations | JHEP 07 (2014) 079 | 1405.0301 |
27 | CMS Collaboration | The CMS trigger system | JINST 12 (2017) P01020 | CMS-TRG-12-001 1609.02366 |
28 | T. Sjostrand et al. | An introduction to PYTHIA 8.2 | CPC 191 (2015) 159 | 1410.3012 |
29 | CMS Collaboration | Event generator tunes obtained from underlying event and multiparton scattering measurements | EPJC 76 (2016) 155 | CMS-GEN-14-001 1512.00815 |
30 | GEANT4 Collaboration | GEANT4---a simulation toolkit | NIMA 506 (2003) 250 | |
31 | M. Cacciari and G. P. Salam | Pileup subtraction using jet areas | PLB 659 (2008) 119 | 0707.1378 |
32 | CMS Collaboration | Description and performance of track and primary-vertex reconstruction with the CMS tracker | JINST 9 (2014) P10009 | CMS-TRK-11-001 1405.6569 |
33 | CMS Collaboration | Identification of heavy-flavour jets with the CMS detector in pp collisions at 13 TeV | JINST 13 (2018) P05011 | CMS-BTV-16-002 1712.07158 |
34 | S. Ioffe and C. Szegedy | Batch normalization: accelerating deep network training by reducing internal covariate shift | in Proceedings of Machine Learning Research, vol. 37, 2015 | 1502.03167 |
35 | A. L. Maas et al. | Rectifier nonlinearities improve neural network acoustic models | ||
36 | F. Chollet et al. | Keras | Software available from keras.io (2015) | |
37 | M. Abadi et al. | TensorFlow: Large-scale machine learning on heterogeneous systems | Software available from tensorflow.org (2015) | |
38 | D. P. Kingma and J. Ba | Adam: A method for stochastic optimization | 1412.6980 | |
39 | T. Hastie, R. Tibshirani, and J. Friedman | The Elements of Statistical Learning | Springer-Verlag New York, 2nd edition | |
40 | A. D. Bukin | Fitting function for asymmetric peaks | 0711.4449 | |
41 | CMS Collaboration | Performance of the CMS missing transverse momentum reconstruction in pp data at $ \sqrt{s} = $ 8 TeV | JINST 10 (2015) P02006 | CMS-JME-13-003 1411.0511 |
Compact Muon Solenoid LHC, CERN |