| CMS-TAU-24-001 ; CERN-EP-2025-233 | ||
| Identification of tau leptons using a convolutional neural network with domain adaptation | ||
| CMS Collaboration | ||
| 7 November 2025 | ||
| Accepted for publication in J. Instrum. | ||
| Abstract: A tau lepton identification algorithm, DEEPTAU, based on convolutional neural network techniques, has been developed in the CMS experiment to discriminate reconstructed hadronic decays of tau leptons ($ \tau_\mathrm{h} $) from quark or gluon jets and electrons and muons that are misreconstructed as $ \tau_\mathrm{h} $ candidates. The latest version of this algorithm, v2.5, includes domain adaptation by backpropagation, a technique that reduces discrepancies between collision data and simulation in the region with the highest purity of genuine $ \tau_\mathrm{h} $ candidates. Additionally, a refined training workflow improves classification performance with respect to the previous version of the algorithm, with a reduction of 30-50% in the probability for quark and gluon jets to be misidentified as $ \tau_\mathrm{h} $ candidates for given reconstruction and identification efficiencies. This paper presents the novel improvements introduced in the DEEPTAU algorithm and evaluates its performance in LHC proton-proton collision data at $ \sqrt{s}= $ 13 and 13.6 TeV collected in 2018 and 2022 with integrated luminosities of 60 and 35 fb$ ^{-1} $, respectively. Techniques to calibrate the performance of the $ \tau_\mathrm{h} $ identification algorithm in simulation with respect to its measured performance in real data are presented, together with a subset of results among those measured for use in CMS physics analyses. | ||
| Links: e-print arXiv:2511.05468 [hep-ex] (PDF) ; CDS record ; inSPIRE record ; CADI line (restricted) ; | ||
| Figures | |
|
png pdf |
Figure 1:
Schematic illustration of the signatures of the $ \mathrm{h}^{\pm} $, $ \mathrm{h}^{\pm}\pi^{0} $, $ \mathrm{h}^{\pm}\mathrm{h}^{\mp}\mathrm{h}^{\pm} $, and $ \mathrm{h}^{\pm}\mathrm{h}^{\mp}\mathrm{h}^{\pm}\pi^{0} $ decay modes of the tau lepton in the CMS detector. Charged hadrons are reconstructed by the PF algorithm by matching tracks with energy deposits in the ECAL and HCAL, whereas the HPS algorithm aims to reconstruct each $ \pi^{0}\to\gamma\gamma $ decay as a single ``strip'' of energy clusters in ECAL. |
|
png pdf |
Figure 2:
Inner and outer grid layout in $ \eta$-$\phi $ space [21]. The inner grid encapsulates the signal cone of maximal radius 0.1, which contains the $ \mathrm{h}^{\pm} $ and $ \pi^{0} $ candidates, and consists of 11 $ {\times} $ 11 cells with a size of 0.02 $ {\times} $ 0.02 each. The outer grid contains the isolation cone of radius 0.5, and consists of 21 $ {\times} $ 21 cells with a size of 0.05 $ {\times} $ 0.05 each. |
|
png pdf |
Figure 3:
The DEEPTAU architecture with the domain adaptation configuration [66]. A set of final domain adaptation layers was introduced for data-simulation discrimination, consisting of several dense layers followed by a softmax layer that yields an output $ y_\text{adv} $ between zero and one. The backpropagation is modified to include the ``adversarial loss'', as described in the text. |
|
png pdf |
Figure 4:
Distribution of the DEEPTAU discriminator against quark and gluon jets before (left) and after (right) domain adaptation, for the 2018 dataset used for domain adaptation training. There is significant improvement in data-simulation agreement in the control region, with the discrepancies in the final bin reduced to 0.9%. The vertical bars on the data points indicate the statistical uncertainty; on most points the bars are smaller than the marker size. |
|
png pdf |
Figure 4-a:
Distribution of the DEEPTAU discriminator against quark and gluon jets before (left) and after (right) domain adaptation, for the 2018 dataset used for domain adaptation training. There is significant improvement in data-simulation agreement in the control region, with the discrepancies in the final bin reduced to 0.9%. The vertical bars on the data points indicate the statistical uncertainty; on most points the bars are smaller than the marker size. |
|
png pdf |
Figure 4-b:
Distribution of the DEEPTAU discriminator against quark and gluon jets before (left) and after (right) domain adaptation, for the 2018 dataset used for domain adaptation training. There is significant improvement in data-simulation agreement in the control region, with the discrepancies in the final bin reduced to 0.9%. The vertical bars on the data points indicate the statistical uncertainty; on most points the bars are smaller than the marker size. |
|
png pdf |
Figure 5:
Distribution of the DEEPTAU discriminator against quark and gluon jets before (left) and after (right) domain adaptation, for the early 2022 dataset. While data-to-simulation differences remain, there is an appreciable improvement in the final bins with the inclusion of domain adaptation, despite DEEPTAU being trained on 2018 data and simulation. The vertical bars on the data points indicate the statistical uncertainty; on most points the bars are smaller than the marker size. |
|
png pdf |
Figure 5-a:
Distribution of the DEEPTAU discriminator against quark and gluon jets before (left) and after (right) domain adaptation, for the early 2022 dataset. While data-to-simulation differences remain, there is an appreciable improvement in the final bins with the inclusion of domain adaptation, despite DEEPTAU being trained on 2018 data and simulation. The vertical bars on the data points indicate the statistical uncertainty; on most points the bars are smaller than the marker size. |
|
png pdf |
Figure 5-b:
Distribution of the DEEPTAU discriminator against quark and gluon jets before (left) and after (right) domain adaptation, for the early 2022 dataset. While data-to-simulation differences remain, there is an appreciable improvement in the final bins with the inclusion of domain adaptation, despite DEEPTAU being trained on 2018 data and simulation. The vertical bars on the data points indicate the statistical uncertainty; on most points the bars are smaller than the marker size. |
|
png pdf |
Figure 6:
Jet misidentification probability versus genuine $ \tau_\mathrm{h} $ identification efficiency for low-$ p_{\mathrm{T}} $ (left) and high-$ p_{\mathrm{T}} $ (right) $ \tau_\mathrm{h} $ candidates, evaluated on 2018 simulated datasets. The genuine $ \tau_\mathrm{h} $ identification efficiency is estimated from $ \mathrm{H}\to \tau \tau $ simulations using reconstructed $ \tau_\mathrm{h} $ candidates that match generator-level $ \tau_\mathrm{h} $ objects. The jet misidentification probability is estimated from $ \mathrm{t} \overline{\mathrm{t}} $ simulations using reconstructed $ \tau_\mathrm{h} $ candidates that do not match prompt electrons, muons, or products of $ \tau_\mathrm{h} $ decays at the generator level. The defined working points of the discriminator are indicated as filled circles. |
|
png pdf |
Figure 6-a:
Jet misidentification probability versus genuine $ \tau_\mathrm{h} $ identification efficiency for low-$ p_{\mathrm{T}} $ (left) and high-$ p_{\mathrm{T}} $ (right) $ \tau_\mathrm{h} $ candidates, evaluated on 2018 simulated datasets. The genuine $ \tau_\mathrm{h} $ identification efficiency is estimated from $ \mathrm{H}\to \tau \tau $ simulations using reconstructed $ \tau_\mathrm{h} $ candidates that match generator-level $ \tau_\mathrm{h} $ objects. The jet misidentification probability is estimated from $ \mathrm{t} \overline{\mathrm{t}} $ simulations using reconstructed $ \tau_\mathrm{h} $ candidates that do not match prompt electrons, muons, or products of $ \tau_\mathrm{h} $ decays at the generator level. The defined working points of the discriminator are indicated as filled circles. |
|
png pdf |
Figure 6-b:
Jet misidentification probability versus genuine $ \tau_\mathrm{h} $ identification efficiency for low-$ p_{\mathrm{T}} $ (left) and high-$ p_{\mathrm{T}} $ (right) $ \tau_\mathrm{h} $ candidates, evaluated on 2018 simulated datasets. The genuine $ \tau_\mathrm{h} $ identification efficiency is estimated from $ \mathrm{H}\to \tau \tau $ simulations using reconstructed $ \tau_\mathrm{h} $ candidates that match generator-level $ \tau_\mathrm{h} $ objects. The jet misidentification probability is estimated from $ \mathrm{t} \overline{\mathrm{t}} $ simulations using reconstructed $ \tau_\mathrm{h} $ candidates that do not match prompt electrons, muons, or products of $ \tau_\mathrm{h} $ decays at the generator level. The defined working points of the discriminator are indicated as filled circles. |
|
png pdf |
Figure 7:
Electron misidentification probability versus genuine $ \tau_\mathrm{h} $ identification efficiency for low-$ p_{\mathrm{T}} $ (left) and high-$ p_{\mathrm{T}} $ (right) $ \tau_\mathrm{h} $ candidates, evaluated on 2018 simulated datasets. The genuine $ \tau_\mathrm{h} $ identification efficiency is estimated from $ \mathrm{H} \to \tau\tau $ simulations using reconstructed $ \tau_\mathrm{h} $ candidates that match generator-level $ \tau_\mathrm{h} $ objects. The electron misidentification probability is estimated from $ \mathrm{Z}/\gamma^{*}+$jets simulation using reconstructed $ \tau_\mathrm{h} $ candidates that match electrons at the generator level. The defined working points of the discriminator are indicated as filled circles. |
|
png pdf |
Figure 7-a:
Electron misidentification probability versus genuine $ \tau_\mathrm{h} $ identification efficiency for low-$ p_{\mathrm{T}} $ (left) and high-$ p_{\mathrm{T}} $ (right) $ \tau_\mathrm{h} $ candidates, evaluated on 2018 simulated datasets. The genuine $ \tau_\mathrm{h} $ identification efficiency is estimated from $ \mathrm{H} \to \tau\tau $ simulations using reconstructed $ \tau_\mathrm{h} $ candidates that match generator-level $ \tau_\mathrm{h} $ objects. The electron misidentification probability is estimated from $ \mathrm{Z}/\gamma^{*}+$jets simulation using reconstructed $ \tau_\mathrm{h} $ candidates that match electrons at the generator level. The defined working points of the discriminator are indicated as filled circles. |
|
png pdf |
Figure 7-b:
Electron misidentification probability versus genuine $ \tau_\mathrm{h} $ identification efficiency for low-$ p_{\mathrm{T}} $ (left) and high-$ p_{\mathrm{T}} $ (right) $ \tau_\mathrm{h} $ candidates, evaluated on 2018 simulated datasets. The genuine $ \tau_\mathrm{h} $ identification efficiency is estimated from $ \mathrm{H} \to \tau\tau $ simulations using reconstructed $ \tau_\mathrm{h} $ candidates that match generator-level $ \tau_\mathrm{h} $ objects. The electron misidentification probability is estimated from $ \mathrm{Z}/\gamma^{*}+$jets simulation using reconstructed $ \tau_\mathrm{h} $ candidates that match electrons at the generator level. The defined working points of the discriminator are indicated as filled circles. |
|
png pdf |
Figure 8:
Muon misidentification probability versus $ \tau_\mathrm{h} $ identification efficiency for low-$ p_{\mathrm{T}} $ (left) and high-$ p_{\mathrm{T}} $ (right) $ \tau_\mathrm{h} $ candidates, evaluated on simulated 2018 datasets. The $ \tau_\mathrm{h} $ identification efficiency is estimated from $ \mathrm{H} \to \tau\tau $ simulations using reconstructed $ \tau_\mathrm{h} $ candidates that match generator-level $ \tau_\mathrm{h} $ objects. The muon misidentification probability is estimated from $ \mathrm{Z}/\gamma^{*}+$jets simulation using reconstructed $ \tau_\mathrm{h} $ candidates that match muons at the generator level. The defined working points of the discriminator are indicated as filled circles. |
|
png pdf |
Figure 8-a:
Muon misidentification probability versus $ \tau_\mathrm{h} $ identification efficiency for low-$ p_{\mathrm{T}} $ (left) and high-$ p_{\mathrm{T}} $ (right) $ \tau_\mathrm{h} $ candidates, evaluated on simulated 2018 datasets. The $ \tau_\mathrm{h} $ identification efficiency is estimated from $ \mathrm{H} \to \tau\tau $ simulations using reconstructed $ \tau_\mathrm{h} $ candidates that match generator-level $ \tau_\mathrm{h} $ objects. The muon misidentification probability is estimated from $ \mathrm{Z}/\gamma^{*}+$jets simulation using reconstructed $ \tau_\mathrm{h} $ candidates that match muons at the generator level. The defined working points of the discriminator are indicated as filled circles. |
|
png pdf |
Figure 8-b:
Muon misidentification probability versus $ \tau_\mathrm{h} $ identification efficiency for low-$ p_{\mathrm{T}} $ (left) and high-$ p_{\mathrm{T}} $ (right) $ \tau_\mathrm{h} $ candidates, evaluated on simulated 2018 datasets. The $ \tau_\mathrm{h} $ identification efficiency is estimated from $ \mathrm{H} \to \tau\tau $ simulations using reconstructed $ \tau_\mathrm{h} $ candidates that match generator-level $ \tau_\mathrm{h} $ objects. The muon misidentification probability is estimated from $ \mathrm{Z}/\gamma^{*}+$jets simulation using reconstructed $ \tau_\mathrm{h} $ candidates that match muons at the generator level. The defined working points of the discriminator are indicated as filled circles. |
|
png pdf |
Figure 9:
Distribution of the invariant mass of the reconstructed $ \mu\,\tau_\mathrm{h} $ system when using DEEPTAU v2.1 (left) and v2.5 (right) for discrimination in the 2018 dataset. The DEEPTAU working points used are: Medium for $ D_\text{jet} $, VVLoose for $ D_\mathrm{e} $ and, Tight for $ D_\mu $ (see Table 2). The correction factors are applied in both cases. The vertical bars correspond to the statistical uncertainties in the observed event yields. |
|
png pdf |
Figure 9-a:
Distribution of the invariant mass of the reconstructed $ \mu\,\tau_\mathrm{h} $ system when using DEEPTAU v2.1 (left) and v2.5 (right) for discrimination in the 2018 dataset. The DEEPTAU working points used are: Medium for $ D_\text{jet} $, VVLoose for $ D_\mathrm{e} $ and, Tight for $ D_\mu $ (see Table 2). The correction factors are applied in both cases. The vertical bars correspond to the statistical uncertainties in the observed event yields. |
|
png pdf |
Figure 9-b:
Distribution of the invariant mass of the reconstructed $ \mu\,\tau_\mathrm{h} $ system when using DEEPTAU v2.1 (left) and v2.5 (right) for discrimination in the 2018 dataset. The DEEPTAU working points used are: Medium for $ D_\text{jet} $, VVLoose for $ D_\mathrm{e} $ and, Tight for $ D_\mu $ (see Table 2). The correction factors are applied in both cases. The vertical bars correspond to the statistical uncertainties in the observed event yields. |
|
png pdf |
Figure 10:
The data-to-simulation scale factors of the $ \tau_\mathrm{h} $ identification efficiency as functions of $ p_{\mathrm{T}} $ in the 2018 (left) and 2022 (right) data-taking periods, including all $ \tau_\mathrm{h} $ decay modes, and requiring the $ D_\text{jet} $ Medium working point (see Table 2) and $ m_{\mathrm{T}}(p_{\mathrm{T}}^\mu,p_{\mathrm{T}}^\text{miss}) < $ 65 GeV. The vertical bars correspond to the combined statistical and systematic uncertainties in the individual scale factors. For a fair scale factor comparison in 2022, the tau energy scale have been fixed to the one measured for DEEPTAU v2.5 which showcases higher genuine $ \tau_\mathrm{h} $ purity. |
|
png pdf |
Figure 10-a:
The data-to-simulation scale factors of the $ \tau_\mathrm{h} $ identification efficiency as functions of $ p_{\mathrm{T}} $ in the 2018 (left) and 2022 (right) data-taking periods, including all $ \tau_\mathrm{h} $ decay modes, and requiring the $ D_\text{jet} $ Medium working point (see Table 2) and $ m_{\mathrm{T}}(p_{\mathrm{T}}^\mu,p_{\mathrm{T}}^\text{miss}) < $ 65 GeV. The vertical bars correspond to the combined statistical and systematic uncertainties in the individual scale factors. For a fair scale factor comparison in 2022, the tau energy scale have been fixed to the one measured for DEEPTAU v2.5 which showcases higher genuine $ \tau_\mathrm{h} $ purity. |
|
png pdf |
Figure 10-b:
The data-to-simulation scale factors of the $ \tau_\mathrm{h} $ identification efficiency as functions of $ p_{\mathrm{T}} $ in the 2018 (left) and 2022 (right) data-taking periods, including all $ \tau_\mathrm{h} $ decay modes, and requiring the $ D_\text{jet} $ Medium working point (see Table 2) and $ m_{\mathrm{T}}(p_{\mathrm{T}}^\mu,p_{\mathrm{T}}^\text{miss}) < $ 65 GeV. The vertical bars correspond to the combined statistical and systematic uncertainties in the individual scale factors. For a fair scale factor comparison in 2022, the tau energy scale have been fixed to the one measured for DEEPTAU v2.5 which showcases higher genuine $ \tau_\mathrm{h} $ purity. |
|
png pdf |
Figure 11:
The $ m_\text{vis} $ distribution in the $ \mathrm{Z}\to\tau_\mu\tau_\mathrm{h} $ channel for the 2022 dataset before (left) and after (right) the full calibration. The DEEPTAU working points used are: Medium for $ D_\text{jet} $, VVLoose for $ D_\mathrm{e} $ and, Tight for $ D_\mu $ (see Table 2). The application of correction factors improves the agreement between data and simulation. |
|
png pdf |
Figure 11-a:
The $ m_\text{vis} $ distribution in the $ \mathrm{Z}\to\tau_\mu\tau_\mathrm{h} $ channel for the 2022 dataset before (left) and after (right) the full calibration. The DEEPTAU working points used are: Medium for $ D_\text{jet} $, VVLoose for $ D_\mathrm{e} $ and, Tight for $ D_\mu $ (see Table 2). The application of correction factors improves the agreement between data and simulation. |
|
png pdf |
Figure 11-b:
The $ m_\text{vis} $ distribution in the $ \mathrm{Z}\to\tau_\mu\tau_\mathrm{h} $ channel for the 2022 dataset before (left) and after (right) the full calibration. The DEEPTAU working points used are: Medium for $ D_\text{jet} $, VVLoose for $ D_\mathrm{e} $ and, Tight for $ D_\mu $ (see Table 2). The application of correction factors improves the agreement between data and simulation. |
|
png pdf |
Figure 12:
The $ m_\text{vis} $ distribution in the $ \mathrm{Z}\to\tau_\mathrm{e}\tau_\mathrm{h} $ channel for the 2022 dataset before (left) and after (right) the full calibration. The DEEPTAU working points used are: Medium for $ D_\text{jet} $, Tight for $ D_\mathrm{e} $ and, Tight for $ D_\mu $ (see Table 2). Specific 2022 detector conditions that affected electron reconstruction are not perfectly modelled in the simulation. As a result, the amount of electrons misidentified as $ \tau_\mathrm{h} $ is enhanced in data with respect to simulated events. The application of correction factors improves the agreement between data and simulation. |
|
png pdf |
Figure 12-a:
The $ m_\text{vis} $ distribution in the $ \mathrm{Z}\to\tau_\mathrm{e}\tau_\mathrm{h} $ channel for the 2022 dataset before (left) and after (right) the full calibration. The DEEPTAU working points used are: Medium for $ D_\text{jet} $, Tight for $ D_\mathrm{e} $ and, Tight for $ D_\mu $ (see Table 2). Specific 2022 detector conditions that affected electron reconstruction are not perfectly modelled in the simulation. As a result, the amount of electrons misidentified as $ \tau_\mathrm{h} $ is enhanced in data with respect to simulated events. The application of correction factors improves the agreement between data and simulation. |
|
png pdf |
Figure 12-b:
The $ m_\text{vis} $ distribution in the $ \mathrm{Z}\to\tau_\mathrm{e}\tau_\mathrm{h} $ channel for the 2022 dataset before (left) and after (right) the full calibration. The DEEPTAU working points used are: Medium for $ D_\text{jet} $, Tight for $ D_\mathrm{e} $ and, Tight for $ D_\mu $ (see Table 2). Specific 2022 detector conditions that affected electron reconstruction are not perfectly modelled in the simulation. As a result, the amount of electrons misidentified as $ \tau_\mathrm{h} $ is enhanced in data with respect to simulated events. The application of correction factors improves the agreement between data and simulation. |
|
png pdf |
Figure 13:
Summary of $ \tau_\mathrm{h} $ identification efficiency (left) and $ \tau_\mathrm{h} $ energy scale corrections (right) across the $ \tau_\mathrm{h} $ decay modes and $ p_{\mathrm{T}} $ regions for 2018 with $ m_{\mathrm{T}}(p_{\mathrm{T}}^\mu,p_{\mathrm{T}}^\text{miss}) < $ 65 GeV and the $ D_\text{jet} $ Medium working point (see Table 2). The horizontal bars represent the total uncertainty on the measurements, combining both statistical and systematic contributions. |
|
png pdf |
Figure 13-a:
Summary of $ \tau_\mathrm{h} $ identification efficiency (left) and $ \tau_\mathrm{h} $ energy scale corrections (right) across the $ \tau_\mathrm{h} $ decay modes and $ p_{\mathrm{T}} $ regions for 2018 with $ m_{\mathrm{T}}(p_{\mathrm{T}}^\mu,p_{\mathrm{T}}^\text{miss}) < $ 65 GeV and the $ D_\text{jet} $ Medium working point (see Table 2). The horizontal bars represent the total uncertainty on the measurements, combining both statistical and systematic contributions. |
|
png pdf |
Figure 13-b:
Summary of $ \tau_\mathrm{h} $ identification efficiency (left) and $ \tau_\mathrm{h} $ energy scale corrections (right) across the $ \tau_\mathrm{h} $ decay modes and $ p_{\mathrm{T}} $ regions for 2018 with $ m_{\mathrm{T}}(p_{\mathrm{T}}^\mu,p_{\mathrm{T}}^\text{miss}) < $ 65 GeV and the $ D_\text{jet} $ Medium working point (see Table 2). The horizontal bars represent the total uncertainty on the measurements, combining both statistical and systematic contributions. |
|
png pdf |
Figure 14:
Summary of $ \tau_\mathrm{h} $ identification efficiency (left) and $ \tau_\mathrm{h} $ energy scale corrections (right) across the $ \tau_\mathrm{h} $ decay modes for 2022 with $ m_{\mathrm{T}}(p_{\mathrm{T}}^\mu,p_{\mathrm{T}}^\text{miss}) < $ 40 GeV and $ D_\text{jet} $ Medium working point (see Table 2). The horizontal bars represent the total uncertainty on the measurements, combining both statistical and systematic contributions. |
|
png pdf |
Figure 14-a:
Summary of $ \tau_\mathrm{h} $ identification efficiency (left) and $ \tau_\mathrm{h} $ energy scale corrections (right) across the $ \tau_\mathrm{h} $ decay modes for 2022 with $ m_{\mathrm{T}}(p_{\mathrm{T}}^\mu,p_{\mathrm{T}}^\text{miss}) < $ 40 GeV and $ D_\text{jet} $ Medium working point (see Table 2). The horizontal bars represent the total uncertainty on the measurements, combining both statistical and systematic contributions. |
|
png pdf |
Figure 14-b:
Summary of $ \tau_\mathrm{h} $ identification efficiency (left) and $ \tau_\mathrm{h} $ energy scale corrections (right) across the $ \tau_\mathrm{h} $ decay modes for 2022 with $ m_{\mathrm{T}}(p_{\mathrm{T}}^\mu,p_{\mathrm{T}}^\text{miss}) < $ 40 GeV and $ D_\text{jet} $ Medium working point (see Table 2). The horizontal bars represent the total uncertainty on the measurements, combining both statistical and systematic contributions. |
|
png pdf |
Figure 15:
Muon misidentification rate scale factors binned by the $ \tau_\mathrm{h} |\eta| $ for the Medium $ D_\mu $ working point (see Table 2). Measurement for the 2018 dataset is shown on the left and for the 2022 dataset on the right. The dashed lines indicate the boundaries of the $ \tau_\mathrm{h} |\eta| $ bins. The vertical bars represent the total uncertainty on the measurements, combining both statistical and systematic contributions. |
|
png pdf |
Figure 15-a:
Muon misidentification rate scale factors binned by the $ \tau_\mathrm{h} |\eta| $ for the Medium $ D_\mu $ working point (see Table 2). Measurement for the 2018 dataset is shown on the left and for the 2022 dataset on the right. The dashed lines indicate the boundaries of the $ \tau_\mathrm{h} |\eta| $ bins. The vertical bars represent the total uncertainty on the measurements, combining both statistical and systematic contributions. |
|
png pdf |
Figure 15-b:
Muon misidentification rate scale factors binned by the $ \tau_\mathrm{h} |\eta| $ for the Medium $ D_\mu $ working point (see Table 2). Measurement for the 2018 dataset is shown on the left and for the 2022 dataset on the right. The dashed lines indicate the boundaries of the $ \tau_\mathrm{h} |\eta| $ bins. The vertical bars represent the total uncertainty on the measurements, combining both statistical and systematic contributions. |
|
png pdf |
Figure 16:
Summary plots of results for electron misidentification rate scale factors divided in decay modes and $ \eta $ regions for the VVLoose $ D_\mathrm{e} $ working point (see Table 2). The corrections are shown for the 2018 (left) and 2022 (right) datasets. The horizontal bars represent the total uncertainty on the measurements, combining both statistical and systematic contributions. |
|
png pdf |
Figure 16-a:
Summary plots of results for electron misidentification rate scale factors divided in decay modes and $ \eta $ regions for the VVLoose $ D_\mathrm{e} $ working point (see Table 2). The corrections are shown for the 2018 (left) and 2022 (right) datasets. The horizontal bars represent the total uncertainty on the measurements, combining both statistical and systematic contributions. |
|
png pdf |
Figure 16-b:
Summary plots of results for electron misidentification rate scale factors divided in decay modes and $ \eta $ regions for the VVLoose $ D_\mathrm{e} $ working point (see Table 2). The corrections are shown for the 2018 (left) and 2022 (right) datasets. The horizontal bars represent the total uncertainty on the measurements, combining both statistical and systematic contributions. |
|
png pdf |
Figure 17:
The high-$ p_{\mathrm{T}} \tau_\mathrm{h} $ identification efficiency scale factors as a function of $ \tau_\mathrm{h} p_{\mathrm{T}} $ for $ D_\text{jet} $ Medium, $ D_\mu $ Tight and $ D_\mathrm{e} $ VVLoose (left) and Tight (right) discriminators (see Table 2). The scale factors are measured for the 2018 (top) and 2022 (bottom) datasets. |
|
png pdf |
Figure 17-a:
The high-$ p_{\mathrm{T}} \tau_\mathrm{h} $ identification efficiency scale factors as a function of $ \tau_\mathrm{h} p_{\mathrm{T}} $ for $ D_\text{jet} $ Medium, $ D_\mu $ Tight and $ D_\mathrm{e} $ VVLoose (left) and Tight (right) discriminators (see Table 2). The scale factors are measured for the 2018 (top) and 2022 (bottom) datasets. |
|
png pdf |
Figure 17-b:
The high-$ p_{\mathrm{T}} \tau_\mathrm{h} $ identification efficiency scale factors as a function of $ \tau_\mathrm{h} p_{\mathrm{T}} $ for $ D_\text{jet} $ Medium, $ D_\mu $ Tight and $ D_\mathrm{e} $ VVLoose (left) and Tight (right) discriminators (see Table 2). The scale factors are measured for the 2018 (top) and 2022 (bottom) datasets. |
|
png pdf |
Figure 17-c:
The high-$ p_{\mathrm{T}} \tau_\mathrm{h} $ identification efficiency scale factors as a function of $ \tau_\mathrm{h} p_{\mathrm{T}} $ for $ D_\text{jet} $ Medium, $ D_\mu $ Tight and $ D_\mathrm{e} $ VVLoose (left) and Tight (right) discriminators (see Table 2). The scale factors are measured for the 2018 (top) and 2022 (bottom) datasets. |
|
png pdf |
Figure 17-d:
The high-$ p_{\mathrm{T}} \tau_\mathrm{h} $ identification efficiency scale factors as a function of $ \tau_\mathrm{h} p_{\mathrm{T}} $ for $ D_\text{jet} $ Medium, $ D_\mu $ Tight and $ D_\mathrm{e} $ VVLoose (left) and Tight (right) discriminators (see Table 2). The scale factors are measured for the 2018 (top) and 2022 (bottom) datasets. |
|
png pdf |
Figure 18:
Prefit (left plots) and postfit (right plots) distribution of $ m_{\mathrm{T}}(p_{\mathrm{T}}^{\tau_\mathrm{h}}, p_{\mathrm{T}}^\text{miss}) $ for $ p_{\mathrm{T}} $ bins of 100 $ < p_{\mathrm{T}} < $ 200 GeV (upper plots) and $ p_{\mathrm{T}} > $ 200 GeV (lower plots) in the 2022 dataset. Distributions are obtained for a combination of $ D_\text{jet} $ Medium, $ D_\mu $ Tight and $ D_\mathrm{e} $ Tight discriminators (see Table 2). |
|
png pdf |
Figure 18-a:
Prefit (left plots) and postfit (right plots) distribution of $ m_{\mathrm{T}}(p_{\mathrm{T}}^{\tau_\mathrm{h}}, p_{\mathrm{T}}^\text{miss}) $ for $ p_{\mathrm{T}} $ bins of 100 $ < p_{\mathrm{T}} < $ 200 GeV (upper plots) and $ p_{\mathrm{T}} > $ 200 GeV (lower plots) in the 2022 dataset. Distributions are obtained for a combination of $ D_\text{jet} $ Medium, $ D_\mu $ Tight and $ D_\mathrm{e} $ Tight discriminators (see Table 2). |
|
png pdf |
Figure 18-b:
Prefit (left plots) and postfit (right plots) distribution of $ m_{\mathrm{T}}(p_{\mathrm{T}}^{\tau_\mathrm{h}}, p_{\mathrm{T}}^\text{miss}) $ for $ p_{\mathrm{T}} $ bins of 100 $ < p_{\mathrm{T}} < $ 200 GeV (upper plots) and $ p_{\mathrm{T}} > $ 200 GeV (lower plots) in the 2022 dataset. Distributions are obtained for a combination of $ D_\text{jet} $ Medium, $ D_\mu $ Tight and $ D_\mathrm{e} $ Tight discriminators (see Table 2). |
|
png pdf |
Figure 18-c:
Prefit (left plots) and postfit (right plots) distribution of $ m_{\mathrm{T}}(p_{\mathrm{T}}^{\tau_\mathrm{h}}, p_{\mathrm{T}}^\text{miss}) $ for $ p_{\mathrm{T}} $ bins of 100 $ < p_{\mathrm{T}} < $ 200 GeV (upper plots) and $ p_{\mathrm{T}} > $ 200 GeV (lower plots) in the 2022 dataset. Distributions are obtained for a combination of $ D_\text{jet} $ Medium, $ D_\mu $ Tight and $ D_\mathrm{e} $ Tight discriminators (see Table 2). |
|
png pdf |
Figure 18-d:
Prefit (left plots) and postfit (right plots) distribution of $ m_{\mathrm{T}}(p_{\mathrm{T}}^{\tau_\mathrm{h}}, p_{\mathrm{T}}^\text{miss}) $ for $ p_{\mathrm{T}} $ bins of 100 $ < p_{\mathrm{T}} < $ 200 GeV (upper plots) and $ p_{\mathrm{T}} > $ 200 GeV (lower plots) in the 2022 dataset. Distributions are obtained for a combination of $ D_\text{jet} $ Medium, $ D_\mu $ Tight and $ D_\mathrm{e} $ Tight discriminators (see Table 2). |
| Tables | |
|
png pdf |
Table 1:
Selection requirements for the domain adaptation dataset. The impact parameters for the muon (or $ \tau_\mathrm{h} $ candidate), $ d_z $ and $ d_{xy} $, are defined as the distances between the muon track (or leading charged-hadron track) and the PV. The medium muon identification is defined in Ref. [27]. The previous DEEPTAU discriminator scores described in Ref. [21] against quark and gluon jets, electrons, and muons, are denoted $ D_\text{jet}^\text{v2.1} $, $ D_\mathrm{e}^\text{v2.1} $, and $ D_\mu^\text{v2.1} $. The transverse mass of the muon $ p_{\mathrm{T}} $ and the missing transverse momentum system is denoted as $ m_{\mathrm{T}}(p_{\mathrm{T}}^\mu,p_{\mathrm{T}}^\text{miss}) $. Working points Tight and VVLoose are defined in Table 2. |
|
png pdf |
Table 2:
Target genuine $ \tau_\mathrm{h} $ identification efficiencies for the different working points defined for the three discriminators. The target efficiencies are evaluated with the $ \mathrm{H}\to\tau\tau $ event sample for $ \tau_\mathrm{h} $ candidates with $ p_{\mathrm{T}} \in $ [30, 70] GeV. |
|
png pdf |
Table A1:
Default values of the parameters used in the classification loss function for DEEPTAU training. |
| Summary |
| In this paper, the newly deployed version of the DEEPTAU algorithm, v2.5, used to discriminate $ \tau_\mathrm{h} $ candidates from quark or gluon jets and electrons and muons, has been introduced. This deep convolutional neural network exhibits improved performance with respect to its predecessor, reducing the jet misidentification rate by 30-50% for a given $ \tau_\mathrm{h} $ reconstruction and identification efficiency. The implementation of domain adaptation by backpropagation has reduced performance discrepancies between collision data and simulation, decreasing the necessary residual corrections by 5%. The domain adaptation was introduced by including an adversarial subnetwork in the gradient calculation of the neural network. This adversarial subnetwork was designed to discriminate between collision data and simulations, running in parallel with the $ \tau_\mathrm{h} $ classification task. The DEEPTAU algorithm, trained using both collision data and simulated samples, is able to maximize the $ \tau_\mathrm{h} $ classification performance, while minimizing the data-simulation discrepancies. The DEEPTAU v2.5 algorithm was trained on simulated proton-proton collision data corresponding to the 2018 data-taking conditions, as well as on real collision data collected during the same year containing $ \mathrm{Z} \to \tau\tau $ decays, which was used for domain adaptation. The algorithm has been validated using 2018 and 2022 collision data. The observed $ \tau_\mathrm{h} $ efficiencies were found to agree with the expected efficiencies from simulated events within 10% for 2018 and 15% for 2022. This agreement is improved with respect to the previous iteration of the algorithm and confirms the effectiveness of domain adaptation. The algorithm has been introduced to be used in CMS physics analyses using data recorded from 2022 onwards. |
| References | ||||
| 1 | CMS Collaboration | Observation of the Higgs boson decay to a pair of $ \tau $ leptons with the CMS detector | PLB 779 (2018) 283 | CMS-HIG-16-043 1708.00373 |
| 2 | CMS Collaboration | Search for Higgs boson pair production in events with two bottom quarks and two tau leptons in proton--proton collisions at $ \sqrt{s}= $ 13 TeV | PLB 778 (2018) 101 | CMS-HIG-17-002 1707.02909 |
| 3 | ATLAS Collaboration | Cross-section measurements of the higgs boson decaying into a pair of $ \tau $-leptons in proton-proton collisions at $ \sqrt{s}= $ 13 TeV with the ATLAS detector | PRD 99 (2019) 072001 | 1811.08856 |
| 4 | ATLAS Collaboration | Search for resonant and nonresonant Higgs boson pair production in the $ {\text{b}\bar{\text{b}}\tau^+\tau^-} $ decay channel in pp collisions at $ \sqrt{s}= $ 13 TeV with the ATLAS detector | PRL 121 (2018) 191801 | 1808.00336 |
| 5 | ATLAS Collaboration | Test of CP invariance in vector-boson fusion production of the higgs boson in the $ \text{H}\rightarrow\tau\tau $ channel in proton-proton collisions at $ \sqrt{s} = $ 13 TeV with the ATLAS detector | PLB 805 (2020) 135426 | 2002.05315 |
| 6 | CMS Collaboration | Search for additional neutral MSSM Higgs bosons in the $ \tau\tau $ final state in proton-proton collisions at $ \sqrt{s}= $ 13 TeV | JHEP 09 (2018) 007 | CMS-HIG-17-020 1803.06553 |
| 7 | ATLAS Collaboration | Search for charged Higgs bosons decaying via $ h^{\pm} \to \tau^{\pm}\nu_{\tau} $ in the $ \tau $+jets and $ \tau $+lepton final states with 36 fb$ ^{-1} $ of pp collision data recorded at $ \sqrt{s} = $ 13 TeV with the ATLAS experiment | JHEP 09 (2018) 139 | 1807.07915 |
| 8 | CMS Collaboration | Search for an exotic decay of the Higgs boson to a pair of light pseudoscalars in the final state with two b quarks and two $ \tau $ leptons in proton-proton collisions at $ \sqrt{s}= $ 13 TeV | PLB 785 (2018) 462 | CMS-HIG-17-024 1805.10191 |
| 9 | CMS Collaboration | Search for a heavy pseudoscalar Higgs boson decaying into a 125 GeV Higgs boson and a Z boson in final states with two tau and two light leptons at $ \sqrt{s}= $ 13 TeV | JHEP 03 (2020) 065 | CMS-HIG-18-023 1910.11634 |
| 10 | CMS Collaboration | Search for lepton flavour violating decays of a neutral heavy Higgs boson to $ \mu\tau $ and e$ \tau $ in proton-proton collisions at $ \sqrt{s}= $ 13 TeV | JHEP 03 (2020) 103 | CMS-HIG-18-017 1911.10267 |
| 11 | CMS Collaboration | Search for a low-mass $ \tau^+\tau^- $ resonance in association with a bottom quark in proton-proton collisions at $ \sqrt{s}= $ 13 TeV | JHEP 05 (2019) 210 | CMS-HIG-17-014 1903.10228 |
| 12 | CMS Collaboration | Search for charged Higgs bosons in the $ h^{\pm} \to \tau^{\pm}\nu_\tau $ decay channel in proton-proton collisions at $ \sqrt{s} = $ 13 TeV | JHEP 07 (2019) 142 | CMS-HIG-18-014 1903.04560 |
| 13 | ATLAS Collaboration | Search for heavy Higgs bosons decaying into two tau leptons with the ATLAS detector using pp collisions at $ \sqrt{s}= $ 13 TeV | PRL 125 (2020) 051801 | 2002.12223 |
| 14 | CMS Collaboration | Search for direct pair production of supersymmetric partners to the $ \tau $ lepton in proton-proton collisions at $ \sqrt{s}= $ 13 TeV | EPJC 80 (2020) 189 | CMS-SUS-18-006 1907.13179 |
| 15 | CMS Collaboration | Search for heavy neutrinos and third-generation leptoquarks in hadronic states of two $ \tau $ leptons and two jets in proton-proton collisions at $ \sqrt{s} = $ 13 TeV | JHEP 03 (2019) 170 | CMS-EXO-17-016 1811.00806 |
| 16 | ATLAS Collaboration | Searches for third-generation scalar leptoquarks in $ \sqrt{s} = $ 13 TeV pp collisions with the ATLAS detector | JHEP 06 (2019) 144 | 1902.08103 |
| 17 | CMS Collaboration | Analysis of the CP structure of the Yukawa coupling between the Higgs boson and $ \tau $ leptons in proton-proton collisions at $ \sqrt{s}= $ 13 TeV | JHEP 06 (2022) 012 | CMS-HIG-20-006 2110.04836 |
| 18 | CMS Collaboration | The CMS experiment at the CERN LHC | JINST 3 (2008) S08004 | |
| 19 | CMS Collaboration | Development of the CMS detector for the CERN LHC Run 3 | JINST 19 (2024) P05064 | CMS-PRF-21-001 2309.05466 |
| 20 | CMS Collaboration | Performance of $ \tau $-lepton reconstruction and identification in CMS | JINST 7 (2012) P01001 | CMS-TAU-11-001 1109.6034 |
| 21 | CMS Collaboration | Identification of hadronic tau lepton decays using a deep neural network | JINST 17 (2022) P07023 | CMS-TAU-20-001 2201.08458 |
| 22 | Y. Ganin and V. Lempitsky | Unsupervised domain adaptation by backpropagation | 1409.7495 | |
| 23 | CMS Collaboration | Performance of the CMS Level-1 trigger in proton-proton collisions at $ \sqrt{s} = $ 13 TeV | JINST 15 (2020) P10017 | CMS-TRG-17-001 2006.10165 |
| 24 | CMS Collaboration | The CMS trigger system | JINST 12 (2017) P01020 | CMS-TRG-12-001 1609.02366 |
| 25 | CMS Collaboration | Performance of the CMS high-level trigger during LHC run 2 | JINST 19 (2024) P11021 | CMS-TRG-19-001 2410.17038 |
| 26 | CMS Collaboration | Electron and photon reconstruction and identification with the CMS experiment at the CERN LHC | JINST 16 (2021) P05014 | CMS-EGM-17-001 2012.06888 |
| 27 | CMS Collaboration | Performance of the CMS muon detector and muon reconstruction with proton-proton collisions at $ \sqrt{s}= $ 13 TeV | JINST 13 (2018) P06015 | CMS-MUO-16-001 1804.04528 |
| 28 | CMS Collaboration | Description and performance of track and primary-vertex reconstruction with the CMS tracker | JINST 9 (2014) P10009 | CMS-TRK-11-001 1405.6569 |
| 29 | J. Alwall et al. | The automated computation of tree-level and next-to-leading order differential cross sections, and their matching to parton shower simulations | JHEP 07 (2014) 079 | 1405.0301 |
| 30 | J. Alwall et al. | Comparative study of various algorithms for the merging of parton showers and matrix elements in hadronic collisions | EPJC 53 (2008) 473 | 0706.2569 |
| 31 | P. Nason | A new method for combining NLO QCD with shower Monte Carlo algorithms | JHEP 11 (2004) 040 | hep-ph/0409146 |
| 32 | S. Frixione, P. Nason, and C. Oleari | Matching NLO QCD computations with parton shower simulations: the POWHEG method | JHEP 11 (2007) 070 | 0709.2092 |
| 33 | S. Alioli, P. Nason, C. Oleari, and E. Re | A general framework for implementing NLO calculations in shower Monte Carlo programs: the POWHEG BOX | JHEP 06 (2010) 043 | 1002.2581 |
| 34 | R. Frederix and S. Frixione | Merging meets matching in MC@NLO | JHEP 12 (2012) 061 | 1209.6215 |
| 35 | S. Frixione, P. Nason, and G. Ridolfi | A positive-weight next-to-leading-order Monte Carlo for heavy flavour hadroproduction | JHEP 09 (2007) 126 | 0707.3088 |
| 36 | J. M. Campbell, R. K. Ellis, P. Nason, and E. Re | Top-pair production and decay at NLO matched with parton showers | JHEP 04 (2015) 114 | 1412.1828 |
| 37 | S. Alioli, P. Nason, C. Oleari, and E. Re | NLO single-top production matched with shower in POWHEG: $ s $- and $ t $-channel contributions | JHEP 09 (2009) 111 | 0907.4076 |
| 38 | E. Re | Single-top Wt-channel production matched with parton showers using the POWHEG method | EPJC 71 (2011) 1547 | 1009.2450 |
| 39 | T. Sjostrand et al. | An introduction to PYTHIA 8.2 | Comput. Phys. Commun. 191 (2015) 159 | 1410.3012 |
| 40 | Y. Li and F. Petriello | Combining QCD and electroweak corrections to production in FEWZ | PRD 86 (2012) 094034 | 1208.5967 |
| 41 | M. Czakon and A. Mitov | Top++: A program for the calculation of the top-pair cross-section at hadron colliders | Comput. Phys. Commun. 185 (2014) 2930 | 1112.5675 |
| 42 | P. Kant et al. | HatHor for single top-quark production: Updated predictions and uncertainty estimates for single top-quark production in hadronic collisions | Comput. Phys. Commun. 191 (2015) 74 | 1406.4403 |
| 43 | CMS Collaboration | Event generator tunes obtained from underlying event and multiparton scattering measurements | EPJC 76 (2016) 155 | CMS-GEN-14-001 1512.00815 |
| 44 | N. Davidson et al. | Universal interface of TAUOLA: Technical and physics documentation | Comput. Phys. Commun. 183 (2012) 821 | 1002.0543 |
| 45 | GEANT4 Collaboration | GEANT4 --- a simulation toolkit | NIM A 506 (2003) 250 | |
| 46 | CMS Collaboration | Pileup mitigation at CMS in 13 TeV data | JINST 15 (2020) P09018 | CMS-JME-18-001 2003.00503 |
| 47 | CMS Collaboration | Particle-flow reconstruction and global event description with the CMS detector | JINST 12 (2017) P10003 | CMS-PRF-14-001 1706.04965 |
| 48 | M. Cacciari, G. P. Salam, and G. Soyez | The $ \text{anti-k}_\text{t} $ jet clustering algorithm | JHEP 04 (2008) 063 | 0802.1189 |
| 49 | M. Cacciari, G. P. Salam, and G. Soyez | FastJet user manual | EPJC 72 (2012) 1896 | 1111.6097 |
| 50 | M. Cacciari and G. P. Salam | Dispelling the $ n^{3} $ myth for the $ k_{\mathrm{T}} $ jet-finder | PLB 641 (2006) 57 | hep-ph/0512210 |
| 51 | CMS Collaboration | Jet energy scale and resolution in the CMS experiment in pp collisions at 8 TeV | JINST 12 (2017) P02014 | CMS-JME-13-004 1607.03663 |
| 52 | CMS Collaboration | Performance of missing transverse momentum reconstruction in proton-proton collisions at $ \sqrt{s} = $ 13 TeV using the CMS detector | JINST 14 (2019) P07004 | CMS-JME-17-001 1903.06078 |
| 53 | CMS Collaboration | ECAL 2016 refined calibration and Run2 summary plots | CMS Detector Performance Summary CMS-DP-2020-021, 2020 CDS |
|
| 54 | CMS Collaboration | Performance of reconstruction and identification of $ \tau $ leptons decaying to hadrons and $ \nu_\tau $ in pp collisions at $ \sqrt{s}= $ 13 TeV | JINST 13 (2018) P10005 | CMS-TAU-16-003 1809.02816 |
| 55 | CMS Collaboration | Performance of electron reconstruction and selection with the CMS detector in proton-proton collisions at $ \sqrt{s} = $ 8 TeV | JINST 10 (2015) P06005 | CMS-EGM-13-001 1502.02701 |
| 56 | D. Bertolini, P. Harris, M. Low, and N. Tran | Pileup per particle identification | JHEP 10 (2014) 059 | 1407.6013 |
| 57 | S. Ioffe and C. Szegedy | Batch normalization: accelerating deep network training by reducing internal covariate shift | in Proc., 32nd Int. Conf. on Machine Learning, ICML'15, p. 448. JMLR.org, 2015 | 1502.03167 |
| 58 | K. He, X. Zhang, S. Ren, and J. Sun | Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification | link | 1502.01852 |
| 59 | I. J. Goodfellow, Y. Bengio, and A. Courville | Deep Learning | MIT Press, Cambridge, MA, USA, 2016 link |
|
| 60 | T.-Y. Lin et al. | Focal loss for dense object detection | TPAMI 42 (2020) 318 | 1708.02002 |
| 61 | D. P. Kingma and J. Ba | Adam: A method for stochastic optimization | in Proc., 3th Int. Conf. on Learning Representations (ICLR 2015). 2015 | 1412.6980 |
| 62 | T. Dozat | Incorporating Nesterov momentum into ADAM | in Proc., 4th Int. Conf. on Learning Representations (ICLR 2016). 2016 link |
|
| 63 | M. Abadi et al. | TensorFlow: Large-scale machine learning on heterogeneous systems | Software available from https://www.tensorflow.org/ |
|
| 64 | F. Chollet et al. | Keras | Software available from https://keras.io |
|
| 65 | CMS Collaboration | A deep neural network to search for new long-lived particles decaying to jets | Science and Technology 03501 (2020) 2 | CMS-EXO-19-011 1912.12238 |
| 66 | L. Russell | Identification of hadronic tau lepton decays with domain adaptation using adversarial machine learning at CMS | Master's thesis, Imperial College London, École Polytechnique Fédérale de Lausanne, Jun, 2022 | |
| 67 | CMS Collaboration | Measurements of inclusive $ W $ and $ Z $ cross sections in $ pp $ collisions at $ \sqrt{s}= $ 7 TeV | JHEP 01 (2011) 080 | CMS-EWK-10-002 1012.2466 |
| 68 | CMS Collaboration | The CMS statistical analysis and combination tool: Combine | Comput. Softw. Big Sci. 8 (2024) 19 | CMS-CAT-23-001 2404.06614 |
| 69 | M. Baak, S. Gadatsch, R. Harrington, and W. Verkerke | Interpolation between multi-dimensional histograms using a new non-linear moment morphing method | NIM A 771 (2015) 39 | |
| 70 | CMS Collaboration | Search for W' decaying to tau lepton and neutrino in proton-proton collisions at $ \sqrt{s} = $ 8 TeV | PLB 755 (2016) 196 | CMS-EXO-12-011 1508.04308 |
|
Compact Muon Solenoid LHC, CERN |
|
|
|
|
|
|