CMS logoCMS event Hgg
Compact Muon Solenoid
LHC, CERN

CMS-JME-25-001 ; CERN-EP-2026-111
Particle transformers for identifying Lorentz-boosted Higgs bosons decaying to a pair of W bosons
Submitted to the Journal of High Energy Physics
Abstract: A novel deep neural network classifier, a ``particle transformer'' (PART), is introduced for the identification of highly Lorentz-boosted resonances reconstructed as single, multipronged jets in measurements and searches performed by the CMS Collaboration at the CERN LHC. Based on a self-attention mechanism that allows the model to weigh the importance of different particles, PART is trained on a wide variety of topologies, notably demonstrating strong performance for the first time on jets originating from boosted Higgs boson decays to W bosons. The PART algorithm achieves a tagging efficiency of more than 50% for such jets at a background efficiency of 1%, while maintaining decorrelation from the jet mass. A calibration is performed in proton-proton collision data collected by CMS at a center-of-mass energy of 13 TeV, with a data set corresponding to a total luminosity of 138 fb$ ^{-1} $. Data-to-simulation selection efficiency scale factors are measured to be in the 0.9--1.0 range, with relative uncertainties between 7 and 23%. The tagging capability of PART enhances the sensitivity of standard model measurements and searches for beyond-the-standard-model resonances decaying to hadronic diboson systems.
Figures & Tables Summary References CMS Publications
Figures

png pdf
Figure 1:
Diagram of the PART model architecture. The model processes two different sets of input features per jet, from PF candidates and SVs. These features are embedded using separate MLPs into 128-dimensional representations before being concatenated and passed through eight PABs. Pairwise features are also calculated between each input element, which are embedded using a single one-dimensional convolutional layer and used as attention biases for each PAB. Two CABs then use the learned features to update a randomly initialized class token, which aggregates these features into a global representation of the jet. Their output is then finally passed through an MLP that outputs the class probabilities, which are normalized by a softmax function, as well as the jet mass.

png pdf
Figure 2:
Full suite of AK8 jet topologies considered for the PART multiclass classification task. Jet types are first categorized by the number of quarks and leptons in the final state, and then further separated by flavor, as shown in the table on the left. The symbols $ \tau_\mathrm{e} $, $ \tau_\mu $, and $ \tau_\mathrm{h} $ refer to $ \tau $ lepton decays to electrons, muons, and hadrons, respectively. The total number of subclasses for each process, therefore, is given by the tensor product ($ \otimes $) between the different final states and flavors. Diagrams illustrating the corresponding jet topologies, which are not exhaustive, are shown on the right.

png pdf
Figure 3:
Evolution of the loss function values for the PART model on the training and validation data sets over training epochs, shown separately for the classification (cross-entropy) and regression (log-cosh) terms.

png pdf
Figure 4:
Comparison of jet mass reconstruction ($ {m_\text{reco}} $) using the SD, PARTICLENET, and PART algorithms, for $ \mathrm{H}\to\mathrm{b}\overline{\mathrm{b}} $ (upper left), $ \mathrm{H}\to\mathrm{W}\mathrm{W} $ (upper right), $ \mathrm{t}\to\mathrm{b}\mathrm{q}\overline{\mathrm{q}} $(lower left), and QCD (lower right) jets with the SM values of $ {m_\mathrm{H}} $ and $ {m_\mathrm{t}} $. An offline selection is applied to the AK8 jets of $ p_{\mathrm{T}} > $ 400 GeV and $ |\eta| < $ 2.4. Statistical uncertainties in the bin yields originating from the limited number of simulated events are represented by vertical error bars. The mass at the peak ($ {m_\text{peak}} $) for each algorithm, calculated using Gaussian kernel density estimation, and the mass resolution, quantified by the FWHM of the resonance peak, are shown as well for H and t jets.

png pdf
Figure 4-a:
Comparison of jet mass reconstruction ($ {m_\text{reco}} $) using the SD, PARTICLENET, and PART algorithms, for $ \mathrm{H}\to\mathrm{b}\overline{\mathrm{b}} $ (upper left), $ \mathrm{H}\to\mathrm{W}\mathrm{W} $ (upper right), $ \mathrm{t}\to\mathrm{b}\mathrm{q}\overline{\mathrm{q}} $(lower left), and QCD (lower right) jets with the SM values of $ {m_\mathrm{H}} $ and $ {m_\mathrm{t}} $. An offline selection is applied to the AK8 jets of $ p_{\mathrm{T}} > $ 400 GeV and $ |\eta| < $ 2.4. Statistical uncertainties in the bin yields originating from the limited number of simulated events are represented by vertical error bars. The mass at the peak ($ {m_\text{peak}} $) for each algorithm, calculated using Gaussian kernel density estimation, and the mass resolution, quantified by the FWHM of the resonance peak, are shown as well for H and t jets.

png pdf
Figure 4-b:
Comparison of jet mass reconstruction ($ {m_\text{reco}} $) using the SD, PARTICLENET, and PART algorithms, for $ \mathrm{H}\to\mathrm{b}\overline{\mathrm{b}} $ (upper left), $ \mathrm{H}\to\mathrm{W}\mathrm{W} $ (upper right), $ \mathrm{t}\to\mathrm{b}\mathrm{q}\overline{\mathrm{q}} $(lower left), and QCD (lower right) jets with the SM values of $ {m_\mathrm{H}} $ and $ {m_\mathrm{t}} $. An offline selection is applied to the AK8 jets of $ p_{\mathrm{T}} > $ 400 GeV and $ |\eta| < $ 2.4. Statistical uncertainties in the bin yields originating from the limited number of simulated events are represented by vertical error bars. The mass at the peak ($ {m_\text{peak}} $) for each algorithm, calculated using Gaussian kernel density estimation, and the mass resolution, quantified by the FWHM of the resonance peak, are shown as well for H and t jets.

png pdf
Figure 4-c:
Comparison of jet mass reconstruction ($ {m_\text{reco}} $) using the SD, PARTICLENET, and PART algorithms, for $ \mathrm{H}\to\mathrm{b}\overline{\mathrm{b}} $ (upper left), $ \mathrm{H}\to\mathrm{W}\mathrm{W} $ (upper right), $ \mathrm{t}\to\mathrm{b}\mathrm{q}\overline{\mathrm{q}} $(lower left), and QCD (lower right) jets with the SM values of $ {m_\mathrm{H}} $ and $ {m_\mathrm{t}} $. An offline selection is applied to the AK8 jets of $ p_{\mathrm{T}} > $ 400 GeV and $ |\eta| < $ 2.4. Statistical uncertainties in the bin yields originating from the limited number of simulated events are represented by vertical error bars. The mass at the peak ($ {m_\text{peak}} $) for each algorithm, calculated using Gaussian kernel density estimation, and the mass resolution, quantified by the FWHM of the resonance peak, are shown as well for H and t jets.

png pdf
Figure 4-d:
Comparison of jet mass reconstruction ($ {m_\text{reco}} $) using the SD, PARTICLENET, and PART algorithms, for $ \mathrm{H}\to\mathrm{b}\overline{\mathrm{b}} $ (upper left), $ \mathrm{H}\to\mathrm{W}\mathrm{W} $ (upper right), $ \mathrm{t}\to\mathrm{b}\mathrm{q}\overline{\mathrm{q}} $(lower left), and QCD (lower right) jets with the SM values of $ {m_\mathrm{H}} $ and $ {m_\mathrm{t}} $. An offline selection is applied to the AK8 jets of $ p_{\mathrm{T}} > $ 400 GeV and $ |\eta| < $ 2.4. Statistical uncertainties in the bin yields originating from the limited number of simulated events are represented by vertical error bars. The mass at the peak ($ {m_\text{peak}} $) for each algorithm, calculated using Gaussian kernel density estimation, and the mass resolution, quantified by the FWHM of the resonance peak, are shown as well for H and t jets.

png pdf
Figure 5:
Receiver operating characteristic curves for $ \mathrm{H}\to\mathrm{W}\mathrm{W} $ signal jets, with the SM values of $ {m_\mathrm{H}} $ and $ {m_\mathrm{t}} $, versus background jets from simulated QCD multijet (left) and $ \mathrm{t} \overline{\mathrm{t}} $ events (right), for the PART} $T_{HWW}$ and the DEEPAK8-MD scores in the $ p_{\mathrm{T}} $ ranges 200--400, 400--600, and 600--1000 GeV. An offline selection is applied to the AK8 jets of $ p_{\mathrm{T}} > $ 200 GeV and $ |\eta| < $ 2.4. Signal jets are required to contain all four generator-level quarks from the W boson decays within $ \Delta R (jet, \mathrm{q}) < $ 0.8.

png pdf
Figure 5-a:
Receiver operating characteristic curves for $ \mathrm{H}\to\mathrm{W}\mathrm{W} $ signal jets, with the SM values of $ {m_\mathrm{H}} $ and $ {m_\mathrm{t}} $, versus background jets from simulated QCD multijet (left) and $ \mathrm{t} \overline{\mathrm{t}} $ events (right), for the PART} $T_{HWW}$ and the DEEPAK8-MD scores in the $ p_{\mathrm{T}} $ ranges 200--400, 400--600, and 600--1000 GeV. An offline selection is applied to the AK8 jets of $ p_{\mathrm{T}} > $ 200 GeV and $ |\eta| < $ 2.4. Signal jets are required to contain all four generator-level quarks from the W boson decays within $ \Delta R (jet, \mathrm{q}) < $ 0.8.

png pdf
Figure 5-b:
Receiver operating characteristic curves for $ \mathrm{H}\to\mathrm{W}\mathrm{W} $ signal jets, with the SM values of $ {m_\mathrm{H}} $ and $ {m_\mathrm{t}} $, versus background jets from simulated QCD multijet (left) and $ \mathrm{t} \overline{\mathrm{t}} $ events (right), for the PART} $T_{HWW}$ and the DEEPAK8-MD scores in the $ p_{\mathrm{T}} $ ranges 200--400, 400--600, and 600--1000 GeV. An offline selection is applied to the AK8 jets of $ p_{\mathrm{T}} > $ 200 GeV and $ |\eta| < $ 2.4. Signal jets are required to contain all four generator-level quarks from the W boson decays within $ \Delta R (jet, \mathrm{q}) < $ 0.8.

png pdf
Figure 6:
Receiver operating characteristic curves for $ Y \to\mathrm{W}\mathrm{W}^* $ signal jets, with varying $ m_{Y} $ and SM $ {m_\mathrm{W}} $, versus background jets from simulated QCD multijet (left) and $ \mathrm{t} \overline{\mathrm{t}} $ events (right), for the PART $T_{HWW}$ score. An offline selection is applied to the AK8 jets of 600 $ < p_{\mathrm{T}} < $ 1000 GeV, $ |\eta| < $ 2.4, and $ {m_\mathrm{SD}} > $ 30 GeV.

png pdf
Figure 6-a:
Receiver operating characteristic curves for $ Y \to\mathrm{W}\mathrm{W}^* $ signal jets, with varying $ m_{Y} $ and SM $ {m_\mathrm{W}} $, versus background jets from simulated QCD multijet (left) and $ \mathrm{t} \overline{\mathrm{t}} $ events (right), for the PART $T_{HWW}$ score. An offline selection is applied to the AK8 jets of 600 $ < p_{\mathrm{T}} < $ 1000 GeV, $ |\eta| < $ 2.4, and $ {m_\mathrm{SD}} > $ 30 GeV.

png pdf
Figure 6-b:
Receiver operating characteristic curves for $ Y \to\mathrm{W}\mathrm{W}^* $ signal jets, with varying $ m_{Y} $ and SM $ {m_\mathrm{W}} $, versus background jets from simulated QCD multijet (left) and $ \mathrm{t} \overline{\mathrm{t}} $ events (right), for the PART $T_{HWW}$ score. An offline selection is applied to the AK8 jets of 600 $ < p_{\mathrm{T}} < $ 1000 GeV, $ |\eta| < $ 2.4, and $ {m_\mathrm{SD}} > $ 30 GeV.

png pdf
Figure 7:
Confusion matrix with each row indicating the fraction of jets per category classified as the column category by PART. An offline selection is applied to the AK8 jets of $ p_{\mathrm{T}} > $ 200 GeV and $ |\eta| < $ 2.4.

png pdf
Figure 8:
Distributions of $ {m_\mathrm{SD}} $ for jets from QCD multijet events, in the $ p_{\mathrm{T}} $ ranges 200--400 GeV (upper), 400--600 GeV (middle) and 600--1000 GeV (lower), after no selections (``inclusive'') on the PART $T_{HWW}$ score (left) and the DEEPAK8-MD score (right) as well as selections corresponding to QCD jet selection efficiencies ($ \epsilon_B $) of 5.0%, 1.0%, and 0.5%. The error bars represent the statistical uncertainties originating from the limited number of simulated events. The lower panels display the ratio of the normalized $ {m_\mathrm{SD}} $ distributions for the different selection efficiencies ($ N_\mathrm{mistag} $) to the normalized inclusive $ {m_\mathrm{SD}} $ distribution ($ N_\mathrm{inclusive} $). An offline selection is applied to the AK8 jets of $ p_{\mathrm{T}} > $ 400 GeV, $ |\eta| < $ 2.4, and $ {m_\mathrm{SD}} > $ 30 GeV.

png pdf
Figure 8-a:
Distributions of $ {m_\mathrm{SD}} $ for jets from QCD multijet events, in the $ p_{\mathrm{T}} $ ranges 200--400 GeV (upper), 400--600 GeV (middle) and 600--1000 GeV (lower), after no selections (``inclusive'') on the PART $T_{HWW}$ score (left) and the DEEPAK8-MD score (right) as well as selections corresponding to QCD jet selection efficiencies ($ \epsilon_B $) of 5.0%, 1.0%, and 0.5%. The error bars represent the statistical uncertainties originating from the limited number of simulated events. The lower panels display the ratio of the normalized $ {m_\mathrm{SD}} $ distributions for the different selection efficiencies ($ N_\mathrm{mistag} $) to the normalized inclusive $ {m_\mathrm{SD}} $ distribution ($ N_\mathrm{inclusive} $). An offline selection is applied to the AK8 jets of $ p_{\mathrm{T}} > $ 400 GeV, $ |\eta| < $ 2.4, and $ {m_\mathrm{SD}} > $ 30 GeV.

png pdf
Figure 8-b:
Distributions of $ {m_\mathrm{SD}} $ for jets from QCD multijet events, in the $ p_{\mathrm{T}} $ ranges 200--400 GeV (upper), 400--600 GeV (middle) and 600--1000 GeV (lower), after no selections (``inclusive'') on the PART $T_{HWW}$ score (left) and the DEEPAK8-MD score (right) as well as selections corresponding to QCD jet selection efficiencies ($ \epsilon_B $) of 5.0%, 1.0%, and 0.5%. The error bars represent the statistical uncertainties originating from the limited number of simulated events. The lower panels display the ratio of the normalized $ {m_\mathrm{SD}} $ distributions for the different selection efficiencies ($ N_\mathrm{mistag} $) to the normalized inclusive $ {m_\mathrm{SD}} $ distribution ($ N_\mathrm{inclusive} $). An offline selection is applied to the AK8 jets of $ p_{\mathrm{T}} > $ 400 GeV, $ |\eta| < $ 2.4, and $ {m_\mathrm{SD}} > $ 30 GeV.

png pdf
Figure 8-c:
Distributions of $ {m_\mathrm{SD}} $ for jets from QCD multijet events, in the $ p_{\mathrm{T}} $ ranges 200--400 GeV (upper), 400--600 GeV (middle) and 600--1000 GeV (lower), after no selections (``inclusive'') on the PART $T_{HWW}$ score (left) and the DEEPAK8-MD score (right) as well as selections corresponding to QCD jet selection efficiencies ($ \epsilon_B $) of 5.0%, 1.0%, and 0.5%. The error bars represent the statistical uncertainties originating from the limited number of simulated events. The lower panels display the ratio of the normalized $ {m_\mathrm{SD}} $ distributions for the different selection efficiencies ($ N_\mathrm{mistag} $) to the normalized inclusive $ {m_\mathrm{SD}} $ distribution ($ N_\mathrm{inclusive} $). An offline selection is applied to the AK8 jets of $ p_{\mathrm{T}} > $ 400 GeV, $ |\eta| < $ 2.4, and $ {m_\mathrm{SD}} > $ 30 GeV.

png pdf
Figure 8-d:
Distributions of $ {m_\mathrm{SD}} $ for jets from QCD multijet events, in the $ p_{\mathrm{T}} $ ranges 200--400 GeV (upper), 400--600 GeV (middle) and 600--1000 GeV (lower), after no selections (``inclusive'') on the PART $T_{HWW}$ score (left) and the DEEPAK8-MD score (right) as well as selections corresponding to QCD jet selection efficiencies ($ \epsilon_B $) of 5.0%, 1.0%, and 0.5%. The error bars represent the statistical uncertainties originating from the limited number of simulated events. The lower panels display the ratio of the normalized $ {m_\mathrm{SD}} $ distributions for the different selection efficiencies ($ N_\mathrm{mistag} $) to the normalized inclusive $ {m_\mathrm{SD}} $ distribution ($ N_\mathrm{inclusive} $). An offline selection is applied to the AK8 jets of $ p_{\mathrm{T}} > $ 400 GeV, $ |\eta| < $ 2.4, and $ {m_\mathrm{SD}} > $ 30 GeV.

png pdf
Figure 8-e:
Distributions of $ {m_\mathrm{SD}} $ for jets from QCD multijet events, in the $ p_{\mathrm{T}} $ ranges 200--400 GeV (upper), 400--600 GeV (middle) and 600--1000 GeV (lower), after no selections (``inclusive'') on the PART $T_{HWW}$ score (left) and the DEEPAK8-MD score (right) as well as selections corresponding to QCD jet selection efficiencies ($ \epsilon_B $) of 5.0%, 1.0%, and 0.5%. The error bars represent the statistical uncertainties originating from the limited number of simulated events. The lower panels display the ratio of the normalized $ {m_\mathrm{SD}} $ distributions for the different selection efficiencies ($ N_\mathrm{mistag} $) to the normalized inclusive $ {m_\mathrm{SD}} $ distribution ($ N_\mathrm{inclusive} $). An offline selection is applied to the AK8 jets of $ p_{\mathrm{T}} > $ 400 GeV, $ |\eta| < $ 2.4, and $ {m_\mathrm{SD}} > $ 30 GeV.

png pdf
Figure 8-f:
Distributions of $ {m_\mathrm{SD}} $ for jets from QCD multijet events, in the $ p_{\mathrm{T}} $ ranges 200--400 GeV (upper), 400--600 GeV (middle) and 600--1000 GeV (lower), after no selections (``inclusive'') on the PART $T_{HWW}$ score (left) and the DEEPAK8-MD score (right) as well as selections corresponding to QCD jet selection efficiencies ($ \epsilon_B $) of 5.0%, 1.0%, and 0.5%. The error bars represent the statistical uncertainties originating from the limited number of simulated events. The lower panels display the ratio of the normalized $ {m_\mathrm{SD}} $ distributions for the different selection efficiencies ($ N_\mathrm{mistag} $) to the normalized inclusive $ {m_\mathrm{SD}} $ distribution ($ N_\mathrm{inclusive} $). An offline selection is applied to the AK8 jets of $ p_{\mathrm{T}} > $ 400 GeV, $ |\eta| < $ 2.4, and $ {m_\mathrm{SD}} > $ 30 GeV.

png pdf
Figure 9:
The Jensen--Shannon distance (JSD) using base 2 between the $ {m_\mathrm{SD}} $ distribution of jets from QCD multijet events with and without a selection on the PART and DEEPAK8-MD tagger scores. On the left, the JSD is plotted for tagger selections corresponding to different QCD jet selection efficiencies ($ {\epsilon_\mathrm{B}} $), with an offline selection of 600 $ < p_{\mathrm{T}} < $ 1000 GeV, $ |\eta| < $ 2.4, and 30 $ < {m_\mathrm{SD}} < $ 250 GeV applied to the jets. On the right, the JSD is plotted for different jet $ p_{\mathrm{T}} $ bins, at a fixed $ {\epsilon_\mathrm{B}} $ of 1%.

png pdf
Figure 9-a:
The Jensen--Shannon distance (JSD) using base 2 between the $ {m_\mathrm{SD}} $ distribution of jets from QCD multijet events with and without a selection on the PART and DEEPAK8-MD tagger scores. On the left, the JSD is plotted for tagger selections corresponding to different QCD jet selection efficiencies ($ {\epsilon_\mathrm{B}} $), with an offline selection of 600 $ < p_{\mathrm{T}} < $ 1000 GeV, $ |\eta| < $ 2.4, and 30 $ < {m_\mathrm{SD}} < $ 250 GeV applied to the jets. On the right, the JSD is plotted for different jet $ p_{\mathrm{T}} $ bins, at a fixed $ {\epsilon_\mathrm{B}} $ of 1%.

png pdf
Figure 9-b:
The Jensen--Shannon distance (JSD) using base 2 between the $ {m_\mathrm{SD}} $ distribution of jets from QCD multijet events with and without a selection on the PART and DEEPAK8-MD tagger scores. On the left, the JSD is plotted for tagger selections corresponding to different QCD jet selection efficiencies ($ {\epsilon_\mathrm{B}} $), with an offline selection of 600 $ < p_{\mathrm{T}} < $ 1000 GeV, $ |\eta| < $ 2.4, and 30 $ < {m_\mathrm{SD}} < $ 250 GeV applied to the jets. On the right, the JSD is plotted for different jet $ p_{\mathrm{T}} $ bins, at a fixed $ {\epsilon_\mathrm{B}} $ of 1%.

png pdf
Figure 10:
Schematic of the LJP calibration method for $ \mathrm{H}\to\mathrm{W}\mathrm{W} $ tagging. Ratios of primary LJP densities in data and simulation are first measured per subjet in merged two-pronged W jets, with an example of such a ratio reproduced from Ref. [30]. These are then used to derive correction factors for $ \mathrm{H}\to\mathrm{W}\mathrm{W} $ signal jets per prong.

png pdf
Figure 11:
Distributions of the PART $T_{HWW}^{No top}$ (left) and DEEPAK8-MD (No top) (right) discriminants with and without the LJP corrections for t-matched jets for data and individual simulated processes in the upper panels, and data versus simulation ratios in the lower panels. The combined uncertainties from LJP-based SFs per bin are shown in shaded gray, and the statistical uncertainty in the number of data events per bin is represented by vertical error bars in the upper and lower panels. The $ \chi^2 $ test statistic values between data and simulation, normalized to the number of degrees of freedom (ndof), are also shown for both discriminants with and without LJP corrections.

png pdf
Figure 11-a:
Distributions of the PART $T_{HWW}^{No top}$ (left) and DEEPAK8-MD (No top) (right) discriminants with and without the LJP corrections for t-matched jets for data and individual simulated processes in the upper panels, and data versus simulation ratios in the lower panels. The combined uncertainties from LJP-based SFs per bin are shown in shaded gray, and the statistical uncertainty in the number of data events per bin is represented by vertical error bars in the upper and lower panels. The $ \chi^2 $ test statistic values between data and simulation, normalized to the number of degrees of freedom (ndof), are also shown for both discriminants with and without LJP corrections.

png pdf
Figure 11-b:
Distributions of the PART $T_{HWW}^{No top}$ (left) and DEEPAK8-MD (No top) (right) discriminants with and without the LJP corrections for t-matched jets for data and individual simulated processes in the upper panels, and data versus simulation ratios in the lower panels. The combined uncertainties from LJP-based SFs per bin are shown in shaded gray, and the statistical uncertainty in the number of data events per bin is represented by vertical error bars in the upper and lower panels. The $ \chi^2 $ test statistic values between data and simulation, normalized to the number of degrees of freedom (ndof), are also shown for both discriminants with and without LJP corrections.
Tables

png pdf
Table 1:
Summary of particle masses in the PART training samples.

png pdf
Table 2:
The complete set of input features per AK8 jet used for the PART model training. Two types of inputs are considered: PF candidates and secondary vertices (SVs). The PF candidate features marked with a star $ (\star) $ apply only to charged PF candidates and a null value is used for neutral candidates.

png pdf
Table 3:
Relative weights of each of the classes used for training the PART model. Each of the four major processes: $ \mathrm{H}\to\mathrm{W}\mathrm{W} $, $ \mathrm{H}\to\text{2-pronged} $, $ \mathrm{t}\to\mathrm{b}\mathrm{W} $, and QCD jets, are weighted equally and have one row dedicated to them each.

png pdf
Table 4:
Signal efficiency SFs and uncertainties for the BDT selections on the PART HWW tagging outputs in the $ {\mathrm{H}\mathrm{H}\to\mathrm{b}\overline{\mathrm{b}}\mathrm{W}\mathrm{W}} $ search, measured using the LJP calibration method for different $ {\mathrm{H}\mathrm{H}} $ signals and analysis regions. Both the total combined uncertainty and the components defined in the text are shown.
Summary
The particle transformer (PART) deep neural network for classifying a wide variety of jets from decays of Lorentz-boosted resonances has been presented. In particular, PART enables effective identification of all-hadronic Higgs boson to W boson ($ \mathrm{H}\to\mathrm{W}\mathrm{W}^*\to4\mathrm{q} $) decays by the CMS experiment for the first time. A novel training strategy is used to address challenges pertaining to $ \mathrm{H}\to\mathrm{W}\mathrm{W} $ classification, through which PART achieves $ > $50% $ \mathrm{H}\to\mathrm{W}\mathrm{W}^*\to4\mathrm{q} $ selection efficiency with a multijet background efficiency of 1%, while maintaining decorrelation with the jet mass. The performance is calibrated on data using the primary Lund jet planes of individual subjets, with data-to-simulation scale factors measured in the 0.9--1.0 range, and relative uncertainties between 7 and 23%. The PART algorithm represents a significant advancement in the identification capabilities of multiprong jets from highly boosted resonances in CMS, illustrated by the first search for boosted Higgs boson pair production in the all-hadronic $ \mathrm{b}\overline{\mathrm{b}}\mathrm{W}\mathrm{W} $ channel.
References
1 CMS Collaboration Observation of a new boson at a mass of 125 GeV with the CMS experiment at the LHC PLB 716 (2012) 30 CMS-HIG-12-028
1207.7235
2 CMS Collaboration Observation of a new boson with mass near 125 GeV in pp collisions at $ \sqrt{s} = $ 7 and 8 TeV JHEP 06 (2013) 081 CMS-HIG-12-036
1303.4571
3 ATLAS Collaboration Observation of a new particle in the search for the standard model Higgs boson with the ATLAS detector at the LHC PLB 716 (2012) 1 1207.7214
4 CMS Collaboration Identification of heavy, energetic, hadronically decaying particles using machine-learning techniques JINST 15 (2020) P06005 CMS-JME-18-002
2004.08262
5 CMS Collaboration Performance of the mass-decorrelated DeepDoubleX classifier for double-b and double-c large-radius jets with the CMS detector CMS Detector Performance Summary CMS-DP-2022-041, 2022
CDS
6 E. A. Moreno et al. JEDI-net: a jet identification algorithm based on interaction networks EPJC 80 (2020) 58 1908.05318
7 E. A. Moreno et al. Interaction networks for the identification of boosted $ \mathrm{H}\to\mathrm{b}\overline{\mathrm{b}} $ decays PRD 102 (2020) 012010 1909.12285
8 H. Qu and L. Gouskos PARTICLENET: Jet tagging via particle clouds PRD 101 (2020) 056019 1902.08570
9 CMS Collaboration Identification of highly Lorentz-boosted heavy particles using graph neural networks and new mass decorrelation techniques CMS Detector Performance Summary CMS-DP-2020-002, 2020
CDS
10 CMS Collaboration Mass regression of highly-boosted jets using graph neural networks CMS Detector Performance Summary CMS-DP-2021-017, 2021
CDS
11 CMS Collaboration Measurement of boosted Higgs bosons produced via vector boson fusion or gluon fusion in the $ \mathrm{H}\to\mathrm{b}\overline{\mathrm{b}} $ decay mode using LHC proton-proton collision data at $ \sqrt{s} = $ 13 TeV JHEP 12 (2024) 035 CMS-HIG-21-020
2407.08012
12 CMS Collaboration Search for Higgs boson decay to a charm quark-antiquark pair in proton-proton collisions at $ \sqrt{s}= $ 13 TeV PRL 131 (2023) 061801 CMS-HIG-21-008
2205.05550
13 CMS Collaboration Search for nonresonant pair production of highly energetic Higgs bosons decaying to bottom quarks PRL 131 (2023) 041803 2205.06667
14 CMS Collaboration Search for a massive scalar resonance decaying to a light scalar and a Higgs boson in the four b quarks final state with boosted topology PLB 842 (2023) 137392 2204.12413
15 CMS Collaboration Search for resonant pair production of Higgs bosons in the $ \mathrm{b}\overline{\mathrm{b}}\mathrm{b}\overline{\mathrm{b}} $ final state using large-area jets in proton-proton collisions at $ \sqrt{s} = $ 13 TeV JHEP 02 (2025) 040 2407.13872
16 CMS Collaboration Search for heavy resonances decaying to a pair of Lorentz-boosted Higgs bosons in final states with leptons and a bottom quark pair at $ \sqrt{s} = $ 13 TeV JHEP 05 (2022) 005 2112.03161
17 CMS Collaboration Search for resonances decaying to three W bosons in the hadronic final state in proton-proton collisions at $ \sqrt{s} = $ 13 TeV PRD 106 (2022) 012002 2112.13090
18 CMS Collaboration Search for resonances decaying to three W bosons in proton-proton collisions at $ \sqrt{s} = $ 13 TeV PRL 129 (2022) 021802 2201.08476
19 A. J. Larkoski, I. Moult, and B. Nachman Jet substructure at the Large Hadron Collider: A review of recent advances in theory and machine learning Phys. Rept. 841 (2020) 1 1709.04464
20 H. Qu, C. Li, and S. Qian Particle transformer for jet tagging in the Int. Conf. on Machine Learning, volume 162, 2022
Proc. 3 (2022) 18281
2202.03772
21 CMS Collaboration Search for a massive resonance decaying into a Higgs boson and a W or Z boson in hadronic final states in proton-proton collisions at $ \sqrt{s}= $ 8 TeV JHEP 02 (2016) 145 CMS-EXO-14-009
1506.01443
22 G. C. Branco et al. Theory and phenomenology of two-Higgs-doublet models Phys. Rept. 516 (2012) 1 1106.0034
23 N. Craig, J. Galloway, and S. Thomas Searching for signs of the second Higgs doublet 1305.2424
24 F. Domingo and S. Pa\ss ehr About the bosonic decays of heavy Higgs states in the (N)MSSM EPJC 82 (2022) 962 2207.05776
25 K. S. Agashe et al. LHC signals from cascade decays of warped vector resonances JHEP 05 (2017) 078 1612.00047
26 K. Agashe et al. Dedicated strategies for triboson signals from cascade decays of vector resonances PRD 99 (2019) 075016 1711.09920
27 H.-Y. Ren, L.-H. Xia, and Y.-P. Kuang Model-independent probe of anomalous heavy neutral Higgs bosons at the LHC PRD 90 (2014) 115002 1404.6367
28 Y.-P. Kuang, H.-Y. Ren, and L.-H. Xia Further investigation of the model-independent probe of heavy neutral Higgs bosons at LHC Run 2 Chin. Phys. C 40 (2016) 023101 1506.08007
29 F. A. Dreyer, G. P. Salam, and G. Soyez The Lund jet plane JHEP 12 (2018) 064 1807.04758
30 CMS Collaboration A method for correcting the substructure of multiprong jets using the Lund jet plane JHEP 11 (2025) 038 CMS-JME-23-001
2507.07775
31 CMS Collaboration Combination of searches for nonresonant Higgs boson pair production in proton-proton collisions at $ \sqrt{s}= $ 13 TeV Submitted to J. Phys. G CMS-HIG-20-011
2510.07527
32 CMS Collaboration The CMS experiment at the CERN LHC JINST 3 (2008) S08004
33 CMS Collaboration Development of the CMS detector for the CERN LHC Run 3 JINST 19 (2024) P05064 CMS-PRF-21-001
2309.05466
34 CMS Collaboration Performance of the CMS Level-1 trigger in proton-proton collisions at $ \sqrt{s} = $ 13 TeV JINST 15 (2020) P10017 CMS-TRG-17-001
2006.10165
35 CMS Collaboration The CMS trigger system JINST 12 (2017) P01020 CMS-TRG-12-001
1609.02366
36 CMS Collaboration Performance of the CMS high-level trigger during LHC Run 2 JINST 19 (2024) P11021 CMS-TRG-19-001
2410.17038
37 CMS Collaboration Electron and photon reconstruction and identification with the CMS experiment at the CERN LHC JINST 16 (2021) P05014 CMS-EGM-17-001
2012.06888
38 CMS Collaboration Performance of the CMS muon detector and muon reconstruction with proton-proton collisions at $ \sqrt{s}= $ 13 TeV JINST 13 (2018) P06015 CMS-MUO-16-001
1804.04528
39 CMS Collaboration Description and performance of track and primary-vertex reconstruction with the CMS tracker JINST 9 (2014) P10009 CMS-TRK-11-001
1405.6569
40 CMS Tracker Group The CMS phase-1 pixel detector upgrade JINST 16 (2021) P02027 2012.14304
41 CMS Collaboration Track impact parameter resolution for the full pseudo rapidity coverage in the 2017 dataset with the CMS phase-1 pixel detector CMS Detector Performance Summary CMS-DP-2020-049, 2020
CDS
42 CMS Collaboration 2017 tracking performance plots CMS Detector Performance Summary CMS-DP-2017-015, 2017
CDS
43 CMS Collaboration Particle-flow reconstruction and global event description with the CMS detector JINST 12 (2017) P10003 CMS-PRF-14-001
1706.04965
44 CMS Collaboration Technical proposal for the Phase-II upgrade of the Compact Muon Solenoid CMS Technical Proposal CERN-LHCC-2015-010, CMS-TDR-15-02, 2015
CDS
45 CMS Collaboration Offline secondary vertex reconstruction in the CMS detector PoS LHCP 236, 2025
link
46 M. Cacciari, G. P. Salam, and G. Soyez The anti-$ k_{\mathrm{T}} $ jet clustering algorithm JHEP 04 (2008) 063 0802.1189
47 M. Cacciari, G. P. Salam, and G. Soyez FastJet user manual EPJC 72 (2012) 1896 1111.6097
48 CMS Collaboration Pileup removal algorithms CMS Physics Analysis Summary, 2014
CMS-PAS-JME-14-001
CMS-PAS-JME-14-001
49 D. Bertolini, P. Harris, M. Low, and N. Tran Pileup per particle identification JHEP 10 (2014) 059 1407.6013
50 CMS Collaboration Pileup mitigation at CMS in 13 TeV data JINST 15 (2020) P09018 CMS-JME-18-001
2003.00503
51 CMS Collaboration Jet energy scale and resolution in the CMS experiment in pp collisions at 8 TeV JINST 12 (2017) P02014 CMS-JME-13-004
1607.03663
52 S. Catani, Y. L. Dokshitzer, M. H. Seymour, and B. R. Webber Longitudinally invariant $ k_{\mathrm{T}} $ clustering algorithms for hadron hadron collisions NPB 406 (1993) 187
53 S. D. Ellis and D. E. Soper Successive combination jet algorithm for hadron collisions PRD 48 (1993) 3160 hep-ph/9305266
54 Y. L. Dokshitzer, G. D. Leder, S. Moretti, and B. R. Webber Better jet clustering algorithms JHEP 08 (1997) 001 hep-ph/9707323
55 M. Wobisch and T. Wengler Hadronization corrections to jet cross-sections in deep inelastic scattering in Proc. Workshop on Monte Carlo Generators for HERA Physics, 1998 hep-ph/9907280
56 A. J. Larkoski, S. Marzani, G. Soyez, and J. Thaler Soft drop JHEP 05 (2014) 146 1402.2657
57 E. Bols et al. Jet flavour classification using DeepJet JINST 15 (2020) P12012 2008.10519
58 CMS Collaboration Performance of missing transverse momentum reconstruction in proton-proton collisions at $ \sqrt{s} = $ 13 TeV using the CMS detector JINST 14 (2019) P07004 CMS-JME-17-001
1903.06078
59 J. Alwall et al. The automated computation of tree-level and next-to-leading order differential cross sections, and their matching to parton shower simulations JHEP 07 (2014) 079 1405.0301
60 S. Bolognesi et al. On the spin and parity of a single-produced resonance at the LHC PRD 86 (2012) 095031 1208.4018
61 Particle Data Group Review of particle physics PRD 110 (2024) 030001
62 T. Sjöstrand et al. An introduction to PYTHIA8.2 Comput. Phys. Commun. 191 (2015) 159 1410.3012
63 M. Cacciari and G. P. Salam Pileup subtraction using jet areas PLB 659 (2008) 119 0707.1378
64 P. Nason A new method for combining NLO QCD with shower Monte Carlo algorithms JHEP 11 (2004) 040 hep-ph/0409146
65 S. Frixione, P. Nason, and C. Oleari Matching NLO QCD computations with parton shower simulations: the POWHEG method JHEP 11 (2007) 070 0709.2092
66 S. Alioli, P. Nason, C. Oleari, and E. Re A general framework for implementing NLO calculations in shower Monte Carlo programs: the POWHEG box JHEP 06 (2010) 043 1002.2581
67 E. Bagnaschi, G. Degrassi, P. Slavich, and A. Vicini Higgs production via gluon fusion in the POWHEG approach in the SM and in the MSSM JHEP 02 (2012) 088 1111.2854
68 M. Grazzini et al. Higgs boson pair production at NNLO with top quark mass effects JHEP 05 (2018) 059 1803.02463
69 S. Dawson, S. Dittmaier, and M. Spira Neutral Higgs boson pair production at hadron colliders: QCD corrections PRD 58 (1998) 115012 hep-ph/9805244
70 D. de Florian and J. Mazzitelli Higgs boson pair production at next-to-next-to-leading order in QCD PRL 111 (2013) 201801 1309.6594
71 D. de Florian and J. Mazzitelli Higgs pair production at next-to-next-to-leading logarithmic accuracy at the LHC JHEP 09 (2015) 053 1505.07122
72 J. Baglio et al. Gluon fusion into Higgs pairs at NLO QCD and the top mass scheme EPJC 79 (2019) 459 1811.05692
73 S. Borowka et al. Higgs boson pair production in gluon fusion at next-to-leading order with full top-quark mass dependence [Erratum: doi:10.1103/PhysRevLett.117.079901]
PRL 117 (2016) 012001
1604.06447
74 D. Y. Shao, C. S. Li, H. T. Li, and J. Wang Threshold resummation effects in Higgs boson pair production at the LHC JHEP 07 (2013) 169 1301.1245
75 CMS Collaboration Extraction and validation of a new set of CMS PYTHIA8 tunes from underlying-event measurements EPJC 80 (2020) 4 CMS-GEN-17-001
1903.12179
76 NNPDF Collaboration Parton distributions for the LHC Run II JHEP 04 (2015) 040 1410.8849
77 NNPDF Collaboration Parton distributions from high-precision collider data EPJC 77 (2017) 663 1706.00428
78 GEANT4 Collaboration GEANT 4---a simulation toolkit NIM A 506 (2003) 250
79 A. Vaswani et al. Attention is all you need in Int. Conf. on Neural Information Processing Systems, NIPS'17, Curran Associates Inc., Red Hook, NY, USA, 2017
Proc. 3 (2017) 6000
1706.03762
80 H. Touvron et al. Going deeper with image transformers in Proc. IEEE/CVF Int. Conf. on Computer Vision (ICCV), 2021
link
2103.17239
81 J. Bridle Training stochastic model recognition algorithms as networks can lead to maximum mutual information estimation of parameters in Advances in Neural Information Processing Systems, D. Touretzky, ed., volume 2. Morgan-Kaufmann, 1989
link
82 F. A. Dreyer and H. Qu Jet tagging in the Lund plane with graph networks JHEP 03 (2021) 052 2012.08526
83 CMS Collaboration Identification of heavy-flavour jets with the CMS detector in pp collisions at 13 TeV JINST 13 (2018) P05011 CMS-BTV-16-002
1712.07158
84 H. Qu Weaver: A machine learning R\&D framework for high energy physics applications https://github.com/hqucms/weaver-core
85 A. Paszke et al. PyTorch: An imperative style, high-performance deep learning library Advances in Neural Information Processing Systems 3 (2019) 8024 1912.01703
86 M. Zhang, J. Lucas, J. Ba, and G. E. Hinton Lookahead optimizer: $ k $ steps forward, 1 step back Advances in Neural Information Processing Systems 3 (2019) 2 1907.08610
87 et al. On the variance of the adaptive learning rate and beyond L. Liu in Proc. Int. Conf. on Learning Representations (ICLR), 2020
link
1908.03265
88 CMS Collaboration Search for heavy scalar resonances decaying to Lorentz-boosted Higgs and Higgs-like bosons in the $ \mathrm{b}\overline{\mathrm{b}} 4\mathrm{q} $ final state at $ \sqrt{s} = $ 13 TeV Submitted to JHEP 2602.00273
89 J. Dolen et al. Thinking outside the ROCs: Designing decorrelated taggers (DDT) for jet substructure JHEP 05 (2016) 156 1603.00027
90 J. Lin Divergence measures based on the Shannon entropy IEEE Trans. on Inf. Th. 37 (1991) 145
91 S. Kullback and R. A. Leibler On information and sufficiency Ann. Math. Statist. 22 (1951) 79
92 CMS Collaboration Performance of heavy-flavour jet identification in Lorentz-boosted topologies in proton-proton collisions at $ \sqrt{s} = $ 13 TeV JINST 20 (2025) P11006 CMS-BTV-22-001
2510.10228
93 ATLAS Collaboration Search for pair production of boosted Higgs bosons via vector-boson fusion in the $ \mathrm{b}\overline{\mathrm{b}}\mathrm{b}\overline{\mathrm{b}} $ final state using pp collisions at $ \sqrt{s} = $ 13 TeV with the ATLAS detector PLB 858 (2024) 139007 2404.17193
Compact Muon Solenoid
LHC, CERN