CMS-MUO-24-001 ; CERN-EP-2024-325 | ||
Identification of low-momentum muons in the CMS detector using multivariate techniques in proton-proton collisions at $ \sqrt{s} = $ 13.6 TeV | ||
CMS Collaboration | ||
23 December 2024 | ||
Submitted to J. Instrum. | ||
Abstract: ``Soft'' muons with a transverse momentum below 10 GeV are featured in many processes studied by the CMS experiment, such as decays of heavy-flavor hadrons or rare tau lepton decays. Maximizing the selection efficiency for these muons, while simultaneously suppressing backgrounds from long-lived light-flavor hadron decays, is therefore important for the success of the CMS physics program. Multivariate techniques have been shown to deliver better muon identification performance than traditional selection techniques. To take full advantage of the large data set currently being collected during Run 3 of the CERN LHC, a new multivariate classifier based on a gradient-boosted decision tree has been developed. It offers a significantly improved separation of signal and background muons compared to a similar classifier used for the analysis of the Run 2 data. The performance of the new classifier is evaluated on a data set collected with the CMS detector in 2022 and 2023, corresponding to an integrated luminosity of 62 fb$ ^{-1} $. | ||
Links: e-print arXiv:2412.17590 [hep-ex] (PDF) ; CDS record ; inSPIRE record ; CADI line (restricted) ; |
Figures | |
png pdf |
Figure 1:
The distribution of the fraction of valid hits on the tracker track (left) and the $ -\ln(p_{\text{global}}) $ (right), where $ p_{\text{global}} $ is the global track fit probability, for signal muons (solid blue line) and background muons (dashed red line) used in the training. |
png pdf |
Figure 1-a:
The distribution of the fraction of valid hits on the tracker track (left) and the $ -\ln(p_{\text{global}}) $ (right), where $ p_{\text{global}} $ is the global track fit probability, for signal muons (solid blue line) and background muons (dashed red line) used in the training. |
png pdf |
Figure 1-b:
The distribution of the fraction of valid hits on the tracker track (left) and the $ -\ln(p_{\text{global}}) $ (right), where $ p_{\text{global}} $ is the global track fit probability, for signal muons (solid blue line) and background muons (dashed red line) used in the training. |
png pdf |
Figure 2:
The ROC curve for the Run 3 soft-muon MVA (blue line) compared to those for the Run 2 soft-muon MVA (orange line) and the Muon MVA (green line). The working points defined in Table 1 are indicated with round, colored markers, with the very loose, loose, medium, and tight working points being represented by the grey, blue, red, and purple markers, respectively. For comparison, the performance of the cut-based IDs is indicated by the colored stars, with the loose, soft, and medium IDs being represented by the green, blue, and red markers, respectively. The upper plot shows the ROC curves for all muons, while the lower ones split the muon sample into those that fired the HLT (left) and those that did not (right). |
png pdf |
Figure 2-a:
The ROC curve for the Run 3 soft-muon MVA (blue line) compared to those for the Run 2 soft-muon MVA (orange line) and the Muon MVA (green line). The working points defined in Table 1 are indicated with round, colored markers, with the very loose, loose, medium, and tight working points being represented by the grey, blue, red, and purple markers, respectively. For comparison, the performance of the cut-based IDs is indicated by the colored stars, with the loose, soft, and medium IDs being represented by the green, blue, and red markers, respectively. The upper plot shows the ROC curves for all muons, while the lower ones split the muon sample into those that fired the HLT (left) and those that did not (right). |
png pdf |
Figure 2-b:
The ROC curve for the Run 3 soft-muon MVA (blue line) compared to those for the Run 2 soft-muon MVA (orange line) and the Muon MVA (green line). The working points defined in Table 1 are indicated with round, colored markers, with the very loose, loose, medium, and tight working points being represented by the grey, blue, red, and purple markers, respectively. For comparison, the performance of the cut-based IDs is indicated by the colored stars, with the loose, soft, and medium IDs being represented by the green, blue, and red markers, respectively. The upper plot shows the ROC curves for all muons, while the lower ones split the muon sample into those that fired the HLT (left) and those that did not (right). |
png pdf |
Figure 2-c:
The ROC curve for the Run 3 soft-muon MVA (blue line) compared to those for the Run 2 soft-muon MVA (orange line) and the Muon MVA (green line). The working points defined in Table 1 are indicated with round, colored markers, with the very loose, loose, medium, and tight working points being represented by the grey, blue, red, and purple markers, respectively. For comparison, the performance of the cut-based IDs is indicated by the colored stars, with the loose, soft, and medium IDs being represented by the green, blue, and red markers, respectively. The upper plot shows the ROC curves for all muons, while the lower ones split the muon sample into those that fired the HLT (left) and those that did not (right). |
png pdf |
Figure 3:
Distribution of the Run 3 soft-muon MVA score for muons of different origins. |
png pdf |
Figure 4:
Cumulative distribution of the classifier score of the Run 3 soft-muon MVA, comparing data from early 2022 (black markers) with inclusive dilepton simulation (blue histogram) for muon pairs with 2.7 $ < m_{\mu\mu} < $ 3.5 GeV. The lower panel shows the ratio of data to simulation. Uncertainties represented by the vertical bars are statistical only.. |
png pdf |
Figure 5:
Efficiencies of the Run 2 (blue) and Run 3 (red) soft-muon MVA as functions of muon $ p_{\mathrm{T}} $ for muon 0 $ < |\eta| < $ 0.9 (upper left), 0.9 $ < |\eta| < $ 1.2 (upper right), 1.2 $ < |\eta| < $ 2.4 (lower). For the Run 3 soft-muon MVA, the medium working point is used. The vertical bars indicate the total uncertainty including statistical and systematic uncertainties. |
png pdf |
Figure 5-a:
Efficiencies of the Run 2 (blue) and Run 3 (red) soft-muon MVA as functions of muon $ p_{\mathrm{T}} $ for muon 0 $ < |\eta| < $ 0.9 (upper left), 0.9 $ < |\eta| < $ 1.2 (upper right), 1.2 $ < |\eta| < $ 2.4 (lower). For the Run 3 soft-muon MVA, the medium working point is used. The vertical bars indicate the total uncertainty including statistical and systematic uncertainties. |
png pdf |
Figure 5-b:
Efficiencies of the Run 2 (blue) and Run 3 (red) soft-muon MVA as functions of muon $ p_{\mathrm{T}} $ for muon 0 $ < |\eta| < $ 0.9 (upper left), 0.9 $ < |\eta| < $ 1.2 (upper right), 1.2 $ < |\eta| < $ 2.4 (lower). For the Run 3 soft-muon MVA, the medium working point is used. The vertical bars indicate the total uncertainty including statistical and systematic uncertainties. |
png pdf |
Figure 5-c:
Efficiencies of the Run 2 (blue) and Run 3 (red) soft-muon MVA as functions of muon $ p_{\mathrm{T}} $ for muon 0 $ < |\eta| < $ 0.9 (upper left), 0.9 $ < |\eta| < $ 1.2 (upper right), 1.2 $ < |\eta| < $ 2.4 (lower). For the Run 3 soft-muon MVA, the medium working point is used. The vertical bars indicate the total uncertainty including statistical and systematic uncertainties. |
png pdf |
Figure 6:
Efficiencies of the Run 2 (blue) and Run 3 (red) soft-muon MVA as functions of muon $ \eta $ for muons with 3 $ < p_{\mathrm{T}} < $ 6 GeV (left) and $ p_{\mathrm{T}} > $ 6 GeV (right). For the Run 3 soft-muon MVA, the medium working point is use. The vertical bars indicate the total uncertainty including statistical and systematic uncertainties. |
png pdf |
Figure 6-a:
Efficiencies of the Run 2 (blue) and Run 3 (red) soft-muon MVA as functions of muon $ \eta $ for muons with 3 $ < p_{\mathrm{T}} < $ 6 GeV (left) and $ p_{\mathrm{T}} > $ 6 GeV (right). For the Run 3 soft-muon MVA, the medium working point is use. The vertical bars indicate the total uncertainty including statistical and systematic uncertainties. |
png pdf |
Figure 6-b:
Efficiencies of the Run 2 (blue) and Run 3 (red) soft-muon MVA as functions of muon $ \eta $ for muons with 3 $ < p_{\mathrm{T}} < $ 6 GeV (left) and $ p_{\mathrm{T}} > $ 6 GeV (right). For the Run 3 soft-muon MVA, the medium working point is use. The vertical bars indicate the total uncertainty including statistical and systematic uncertainties. |
png pdf |
Figure 7:
Efficiencies of the different working points of the Run 3 soft-muon MVA as functions of muon $ p_{\mathrm{T}} $ for muon 0 $ < |\eta| < $ 0.9 (upper left), 0.9 $ < |\eta| < $ 1.2 (upper right), 1.2 $ < |\eta| < $ 2.4 (lower). The vertical bars indicate the total uncertainty including statistical and systematic uncertainties. |
png pdf |
Figure 7-a:
Efficiencies of the different working points of the Run 3 soft-muon MVA as functions of muon $ p_{\mathrm{T}} $ for muon 0 $ < |\eta| < $ 0.9 (upper left), 0.9 $ < |\eta| < $ 1.2 (upper right), 1.2 $ < |\eta| < $ 2.4 (lower). The vertical bars indicate the total uncertainty including statistical and systematic uncertainties. |
png pdf |
Figure 7-b:
Efficiencies of the different working points of the Run 3 soft-muon MVA as functions of muon $ p_{\mathrm{T}} $ for muon 0 $ < |\eta| < $ 0.9 (upper left), 0.9 $ < |\eta| < $ 1.2 (upper right), 1.2 $ < |\eta| < $ 2.4 (lower). The vertical bars indicate the total uncertainty including statistical and systematic uncertainties. |
png pdf |
Figure 7-c:
Efficiencies of the different working points of the Run 3 soft-muon MVA as functions of muon $ p_{\mathrm{T}} $ for muon 0 $ < |\eta| < $ 0.9 (upper left), 0.9 $ < |\eta| < $ 1.2 (upper right), 1.2 $ < |\eta| < $ 2.4 (lower). The vertical bars indicate the total uncertainty including statistical and systematic uncertainties. |
png pdf |
Figure 8:
Efficiencies of different working points of the Run 3 soft-muon MVA as functions of muon $ \eta $ for muons with 3 $ < p_{\mathrm{T}} < $ 6 GeV (left) and $ p_{\mathrm{T}} > $ 6 GeV (right). The vertical bars indicate the total uncertainty including statistical and systematic uncertainties. |
png pdf |
Figure 8-a:
Efficiencies of different working points of the Run 3 soft-muon MVA as functions of muon $ \eta $ for muons with 3 $ < p_{\mathrm{T}} < $ 6 GeV (left) and $ p_{\mathrm{T}} > $ 6 GeV (right). The vertical bars indicate the total uncertainty including statistical and systematic uncertainties. |
png pdf |
Figure 8-b:
Efficiencies of different working points of the Run 3 soft-muon MVA as functions of muon $ \eta $ for muons with 3 $ < p_{\mathrm{T}} < $ 6 GeV (left) and $ p_{\mathrm{T}} > $ 6 GeV (right). The vertical bars indicate the total uncertainty including statistical and systematic uncertainties. |
png pdf |
Figure 9:
Background rates for muons from pions of different working points of the Run 3 soft-muon MVA for 2022--2023 data and simulation as a function of pion decay length ($ L_{xy} $) measured in cm (left) and muon $ p_{\mathrm{T}} $ (right). The muon $ p_{\mathrm{T}} > $ 4 GeV is required in the $ L_{xy} $ plot, whereas the misidentification rate of muons 2 $ < p_{\mathrm{T}} < $ 4 GeV is measured only for $ |\eta| > $ 1 since central muons in this $ p_{\mathrm{T}} $ range do not reach the muon detector. The vertical bars indicate the statistical uncertainty. |
png pdf |
Figure 9-a:
Background rates for muons from pions of different working points of the Run 3 soft-muon MVA for 2022--2023 data and simulation as a function of pion decay length ($ L_{xy} $) measured in cm (left) and muon $ p_{\mathrm{T}} $ (right). The muon $ p_{\mathrm{T}} > $ 4 GeV is required in the $ L_{xy} $ plot, whereas the misidentification rate of muons 2 $ < p_{\mathrm{T}} < $ 4 GeV is measured only for $ |\eta| > $ 1 since central muons in this $ p_{\mathrm{T}} $ range do not reach the muon detector. The vertical bars indicate the statistical uncertainty. |
png pdf |
Figure 9-b:
Background rates for muons from pions of different working points of the Run 3 soft-muon MVA for 2022--2023 data and simulation as a function of pion decay length ($ L_{xy} $) measured in cm (left) and muon $ p_{\mathrm{T}} $ (right). The muon $ p_{\mathrm{T}} > $ 4 GeV is required in the $ L_{xy} $ plot, whereas the misidentification rate of muons 2 $ < p_{\mathrm{T}} < $ 4 GeV is measured only for $ |\eta| > $ 1 since central muons in this $ p_{\mathrm{T}} $ range do not reach the muon detector. The vertical bars indicate the statistical uncertainty. |
png pdf |
Figure 10:
Background rates for muons from kaons of different working points of the Run 3 soft-muon MVA for 2022--2023 data and simulation as a function of muon $ p_{\mathrm{T}} $. The misidentification rate of muons 2 $ < p_{\mathrm{T}} < $ 4 GeV is measured only for $ |\eta| > $ 1 since central muons in this $ p_{\mathrm{T}} $ range do not reach the muon detector. The vertical bars indicate the statistical uncertainty. |
Tables | |
png pdf |
Table 1:
Working points for the Run 3 soft-muon MVA classifier. For each working point, the efficiency of selecting muons from heavy-flavor decays, the purity of the selected muon sample, and the fraction of selected objects that are not associated with a real muon are given. These metrics are evaluated on the inclusive dilepton sample. |
Summary |
A multivariate (MVA) classifier based on gradient-boosted decision trees has been developed for the selection of soft muons from the decay of heavy-flavor hadrons and various rare decays for transverse muon momenta $ p_{\mathrm{T}} $ below 10 GeV. It is optimized for the analysis of data recorded by the CMS experiment during Run 3 of the LHC. The classifier was trained to separate these muons from background muons from pion and kaon decays. The training uses muons that pass a looser selection in a larger phase space, compared with the training of a similar classifier used in the analysis of Run 2 data, thus increasing the sensitivity to the processes outlined above. The new training also takes into account a larger set of input features and uses input samples matching the beam and detector conditions in Run 3. Consequently, the new Run 3 soft-muon MVA offers significantly improved selection efficiency for the same background rejection as the Run 2 version in both simulated samples and collision data recorded in 2022 and 2023. This improvement is most apparent for muons with $ p_{\mathrm{T}} < $ 4 GeV and muons reconstructed in the forward direction of the CMS detector, which were not included in the training of the Run 2 version of the classifier. The efficiency and background rate of the Run 3 soft-muon MVA measured in collision data is generally well described by the CMS simulation, with some significant differences observed for forward muons, especially for strict selections on the MVA score, which can be corrected at the analysis level. |
References | ||||
1 | CMS Collaboration | Measurement of properties of B$ ^0_\mathrm{s}\to\mu^+\mu^- $ decays and search for B$ ^0\to\mu^+\mu^- $ with the CMS experiment | JHEP 04 (2020) 188 | CMS-BPH-16-004 1910.12127 |
2 | CMS Collaboration | Measurement of the B$ ^{0}_{\mathrm{s}} \to \mu^{+}\mu^{-} $ decay properties and search for the B$ ^0 \to \mu^{+}\mu^{-} $ decay in proton-proton collisions at $ \sqrt{s}= $ 13 TeV | PLB 842 (2023) 137955 | CMS-BPH-21-006 2212.10311 |
3 | CMS Collaboration | Search for the lepton flavor violating $ \tau \to 3\mu $ decay in proton-proton collisions at $ \sqrt{s} = $ 13 TeV | PLB 853 (2024) 138633 | CMS-BPH-21-005 2312.02371 |
4 | W. J. Marciano, T. Mori, and J. M. Roney | Charged lepton flavor violation experiments | Ann. Rev. Nucl. Part. Sci. 58 (2008) 315 | |
5 | M. Raidal et al. | Flavour physics of leptons and dipole moments | EPJC 57 (2008) 13 | 0801.1826 |
6 | E. Arganda and M. J. Herrero | Testing supersymmetry with lepton flavor violating $ \tau $ and $ \mu $ decays | PRD 73 (2006) 055003 | hep-ph/0510405 |
7 | CMS Collaboration | Performance of CMS muon reconstruction in pp collision events at $ \sqrt{s}= $ 7 TeV | JINST 7 (2012) P10002 | CMS-MUO-10-004 1206.4071 |
8 | CMS Collaboration | Performance of the CMS muon detector and muon reconstruction with proton-proton collisions at $ \sqrt{s}= $ 13 TeV | JINST 13 (2018) P06015 | CMS-MUO-16-001 1804.04528 |
9 | J. Therhaag | TMVA toolkit for multivariate data analysis in ROOT | PoS ICHEP 510, 2010 link |
|
10 | CMS Collaboration | The CMS experiment at the CERN LHC | JINST 3 (2008) S08004 | |
11 | CMS Collaboration | Development of the CMS detector for the CERN LHC Run 3 | JINST 19 (2024) P05064 | CMS-PRF-21-001 2309.05466 |
12 | Tracker Group of the CMS Collaboration | The CMS phase-1 pixel detector upgrade | JINST 16 (2021) P02027 | 2012.14304 |
13 | CMS Collaboration | Offline luminosity measurement for the 2022 pp collisions at 13.6 TeV data set at CMS | CMS Physics Analysis Summary, 2024 CMS-PAS-LUM-22-001 |
CMS-PAS-LUM-22-001 |
14 | CMS Collaboration | Track impact parameter resolution for the full pseudo rapidity coverage in the 2017 dataset with the CMS phase-1 pixel detector | CMS Detector Performance Summary CMS-DP-2020-049, 2020 CDS |
|
15 | CMS Collaboration | Performance of the CMS Level-1 trigger in proton-proton collisions at $ \sqrt{s} = $ 13 TeV | JINST 15 (2020) P10017 | CMS-TRG-17-001 2006.10165 |
16 | CMS Collaboration | The CMS trigger system | JINST 12 (2017) P01020 | CMS-TRG-12-001 1609.02366 |
17 | CMS Collaboration | Performance of the CMS muon trigger system in proton-proton collisions at $ \sqrt{s}= $ 13 TeV | JINST 16 (2021) P07001 | CMS-MUO-19-001 2102.04790 |
18 | R. Frühwirth | Application of Kalman filtering to track and vertex fitting | NIM A 262 (1987) 444 | |
19 | CMS Collaboration | Description and performance of track and primary-vertex reconstruction with the CMS tracker | JINST 9 (2014) P10009 | CMS-TRK-11-001 1405.6569 |
20 | R. E. Schapire | Explaining adaboost | in Empirical Inference: Festschrift in Honor of Vladimir N. Vapnik, Springer, 2013 link |
|
21 | CMS Collaboration | Muon identification using multivariate techniques in the CMS experiment in proton-proton collisions at $ \sqrt{s}= $ 13 TeV | JINST 19 (2024) P02031 | CMS-MUO-22-001 2310.03844 |
22 | T. Sjöstrand et al. | An introduction to PYTHIA 8.2 | Comput. Phys. Commun. 191 (2015) 159 | 1410.3012 |
23 | J. Alwall et al. | The automated computation of tree-level and next-to-leading order differential cross sections, and their matching to parton shower simulations | JHEP 07 (2014) 079 | 1405.0301 |
24 | J. Alwall et al. | Comparative study of various algorithms for the merging of parton showers and matrix elements in hadronic collisions | EPJC 53 (2008) 473 | 0706.2569 |
25 | R. Frederix and S. Frixione | Merging meets matching in MC@NLO | JHEP 12 (2012) 061 | 1209.6215 |
26 | S. Alioli, P. Nason, C. Oleari, and E. Re | A general framework for implementing NLO calculations in shower Monte Carlo programs: The POWHEG box | JHEP 06 (2010) 043 | 1002.2581 |
27 | S. Frixione, P. Nason, and C. Oleari | Matching NLO QCD computations with parton shower simulations: the POWHEG method | JHEP 11 (2007) 070 | 0709.2092 |
28 | P. Nason | A new method for combining NLO QCD with shower Monte Carlo algorithms | JHEP 11 (2004) 040 | hep-ph/0409146 |
29 | P. F. Monni et al. | MiNNLO$ _{PS} $: A new method to match NNLO QCD to parton showers | JHEP 05 (2020) 143 | 1908.06987 |
30 | NNPDF Collaboration | Parton distributions from high-precision collider data | EPJC 77 (2017) 663 | 1706.00428 |
31 | CMS Collaboration | Extraction and validation of a new set of CMS PYTHIA8 tunes from underlying-event measurements | EPJC 80 (2020) 4 | CMS-GEN-17-001 1903.12179 |
32 | GEANT4 Collaboration | GEANT 4---a simulation toolkit | NIM A 506 (2003) 250 | |
33 | J. H. Friedman | Greedy function approximation: A gradient boosting machine | Ann. Stat. 29 (2001) 1189 | |
34 | T. Chen and C. Guestrin | XGBoost: A scalable tree boosting system | in Proc. 22nd ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, KDD '16, 2016 link |
|
35 | F. Pedregosa et al. | Scikit-learn: Machine learning in Python | JMLR 12 (2011) 2825 | 1201.0490 |
36 | CMS Collaboration | Measurements of inclusive W and Z cross sections in $ {\mathrm{p}\mathrm{p}} $ collisions at $ \sqrt{s}= $ 7 TeV | JHEP 01 (2011) 080 | CMS-EWK-10-002 1012.2466 |
37 | CMS Collaboration | Technical proposal for the Phase-II upgrade of the Compact Muon Solenoid | CMS Technical Proposal CERN-LHCC-2015-010, CMS-TDR-15-02, 2015 CDS |
|
38 | M. J. Oreglia | A study of the reactions $ \psi^\prime \to \gamma \gamma \psi $ | PhD thesis, Stanford University, SLAC Report SLAC-R-236, see Appendix D, 1980 link |
Compact Muon Solenoid LHC, CERN |