CMS logoCMS event Hgg
Compact Muon Solenoid
LHC, CERN

CMS-MLG-24-002 ; CERN-EP-2025-209
Wasserstein normalized autoencoder for anomaly detection
Submitted to Machine Learning: Science and Technology
Abstract: A novel anomaly detection algorithm is presented. The Wasserstein normalized autoencoder (WNAE) is a normalized probabilistic model that minimizes the Wasserstein distance between the learned probability distribution---a Boltzmann distribution where the energy is the reconstruction error of the autoencoder---and the distribution of the training data. This algorithm has been developed and applied to the identification of semivisible jets---conical sprays of visible standard model particles and invisible dark matter states---with the CMS experiment at the CERN LHC. Trained on jets of particles from simulated standard model processes, the WNAE is shown to learn the probability distribution of the input data in a fully unsupervised fashion, such that it effectively identifies new physics jets as anomalies. The model consistently demonstrates stable, convergent training and achieves strong classification performance across a wide range of signals, improving upon standard normalized autoencoders, while remaining agnostic to the signal. The WNAE directly tackles the problem of outlier reconstruction, a common failure mode of autoencoders in anomaly detection tasks.
Figures & Tables Summary References CMS Publications
Figures

png pdf
Figure 1:
Schematic visualization of the outlier reconstruction failure mode. Signal samples drawn from the hatched area are reconstructed well by the AE, despite not being part of the training set, and thus are not separated from the background. The AE training is assumed to have converged such that the background is reconstructed well.

png pdf
Figure 2:
An illustration of collider SVJ production. The dashed black arrows indicate stable, undetectable DM candidate particles. Figure adapted from Ref. [51].

png pdf
Figure 3:
Left: the reconstruction error (upper panel) and the AUC scores (lower panel) for the AE trained on the $ \mathrm{t} \overline{\mathrm{t}} $ background, evaluated during each training epoch on $ \mathrm{t} \overline{\mathrm{t}} $ background jets and signal models with $ m_{\Phi} = $ 2000 GeV and $ r_{\text{inv}} = $ 0.3 (upper) or $ r_{\text{inv}} = 0.1, 0.3, 0.5, $ 0.7 (lower). Right: The AUC scores for the same AE, evaluated for the epoch with the minimal background reconstruction error, for the classification of several SVJ signal hypotheses against the $ \mathrm{t} \overline{\mathrm{t}} $ background. The AUC scores are close to 0.5, indicating that the AE is unable to discriminate between the SVJ signal and the $ \mathrm{t} \overline{\mathrm{t}} $ background.

png pdf
Figure 3-a:
Left: the reconstruction error (upper panel) and the AUC scores (lower panel) for the AE trained on the $ \mathrm{t} \overline{\mathrm{t}} $ background, evaluated during each training epoch on $ \mathrm{t} \overline{\mathrm{t}} $ background jets and signal models with $ m_{\Phi} = $ 2000 GeV and $ r_{\text{inv}} = $ 0.3 (upper) or $ r_{\text{inv}} = 0.1, 0.3, 0.5, $ 0.7 (lower). Right: The AUC scores for the same AE, evaluated for the epoch with the minimal background reconstruction error, for the classification of several SVJ signal hypotheses against the $ \mathrm{t} \overline{\mathrm{t}} $ background. The AUC scores are close to 0.5, indicating that the AE is unable to discriminate between the SVJ signal and the $ \mathrm{t} \overline{\mathrm{t}} $ background.

png pdf
Figure 3-b:
Left: the reconstruction error (upper panel) and the AUC scores (lower panel) for the AE trained on the $ \mathrm{t} \overline{\mathrm{t}} $ background, evaluated during each training epoch on $ \mathrm{t} \overline{\mathrm{t}} $ background jets and signal models with $ m_{\Phi} = $ 2000 GeV and $ r_{\text{inv}} = $ 0.3 (upper) or $ r_{\text{inv}} = 0.1, 0.3, 0.5, $ 0.7 (lower). Right: The AUC scores for the same AE, evaluated for the epoch with the minimal background reconstruction error, for the classification of several SVJ signal hypotheses against the $ \mathrm{t} \overline{\mathrm{t}} $ background. The AUC scores are close to 0.5, indicating that the AE is unable to discriminate between the SVJ signal and the $ \mathrm{t} \overline{\mathrm{t}} $ background.

png pdf
Figure 4:
Left: NAE training showing the divergence of the loss function, in terms of positive and negative energy (upper panel), and the AUC for several signal hypotheses with fixed $ m_{\Phi} = $ 2000 GeV and varying $ r_{\text{inv}} $ values (lower panel). Right: the positive and negative energies from the upper panel of the left plot, shown for $ \text{epoch} < $ 250---before the divergence---and 0.18 $ < \text{energy} < $ 1.4 on a linear scale, to illustrate their differences.

png pdf
Figure 4-a:
Left: NAE training showing the divergence of the loss function, in terms of positive and negative energy (upper panel), and the AUC for several signal hypotheses with fixed $ m_{\Phi} = $ 2000 GeV and varying $ r_{\text{inv}} $ values (lower panel). Right: the positive and negative energies from the upper panel of the left plot, shown for $ \text{epoch} < $ 250---before the divergence---and 0.18 $ < \text{energy} < $ 1.4 on a linear scale, to illustrate their differences.

png pdf
Figure 4-b:
Left: NAE training showing the divergence of the loss function, in terms of positive and negative energy (upper panel), and the AUC for several signal hypotheses with fixed $ m_{\Phi} = $ 2000 GeV and varying $ r_{\text{inv}} $ values (lower panel). Right: the positive and negative energies from the upper panel of the left plot, shown for $ \text{epoch} < $ 250---before the divergence---and 0.18 $ < \text{energy} < $ 1.4 on a linear scale, to illustrate their differences.

png pdf
Figure 5:
Distributions of the input feature $ \tau_{3} $ for positive, negative, and signal samples, before (epochs 274) and after (epochs 275--279) the start of the divergence of the NAE loss. The signal distributions are overlaid for illustration; signal samples are not used during the training. All distributions are normalized such that their integral is 100.

png pdf
Figure 5-a:
Distributions of the input feature $ \tau_{3} $ for positive, negative, and signal samples, before (epochs 274) and after (epochs 275--279) the start of the divergence of the NAE loss. The signal distributions are overlaid for illustration; signal samples are not used during the training. All distributions are normalized such that their integral is 100.

png pdf
Figure 5-b:
Distributions of the input feature $ \tau_{3} $ for positive, negative, and signal samples, before (epochs 274) and after (epochs 275--279) the start of the divergence of the NAE loss. The signal distributions are overlaid for illustration; signal samples are not used during the training. All distributions are normalized such that their integral is 100.

png pdf
Figure 5-c:
Distributions of the input feature $ \tau_{3} $ for positive, negative, and signal samples, before (epochs 274) and after (epochs 275--279) the start of the divergence of the NAE loss. The signal distributions are overlaid for illustration; signal samples are not used during the training. All distributions are normalized such that their integral is 100.

png pdf
Figure 5-d:
Distributions of the input feature $ \tau_{3} $ for positive, negative, and signal samples, before (epochs 274) and after (epochs 275--279) the start of the divergence of the NAE loss. The signal distributions are overlaid for illustration; signal samples are not used during the training. All distributions are normalized such that their integral is 100.

png pdf
Figure 5-e:
Distributions of the input feature $ \tau_{3} $ for positive, negative, and signal samples, before (epochs 274) and after (epochs 275--279) the start of the divergence of the NAE loss. The signal distributions are overlaid for illustration; signal samples are not used during the training. All distributions are normalized such that their integral is 100.

png pdf
Figure 5-f:
Distributions of the input feature $ \tau_{3} $ for positive, negative, and signal samples, before (epochs 274) and after (epochs 275--279) the start of the divergence of the NAE loss. The signal distributions are overlaid for illustration; signal samples are not used during the training. All distributions are normalized such that their integral is 100.

png pdf
Figure 6:
As a function of epoch during the training of an NAE with the loss function from Eq. \eqrefeq:nae_loss_logcosh: the positive and negative energies and the value of the loss function (upper panel); the Wasserstein distance between negative and positive samples and the AUC for several signal hypotheses with fixed $ m_{\Phi} = $ 2000 GeV and varying $ r_{\text{inv}} $ (lower panel).

png pdf
Figure 7:
Schematic representation of the mode collapse when using the loss function described in Eq. \eqrefeq:nae_loss_logcosh. The energy landscape is shown before (left) and after (right) the mode collapse. On the right, the reconstruction errors for the signal and background supports are completely overlapping on the vertical axis. The symbols $ E_+ $ and $ E_- $ denote the positive and negative energies, respectively.

png pdf
Figure 7-a:
Schematic representation of the mode collapse when using the loss function described in Eq. \eqrefeq:nae_loss_logcosh. The energy landscape is shown before (left) and after (right) the mode collapse. On the right, the reconstruction errors for the signal and background supports are completely overlapping on the vertical axis. The symbols $ E_+ $ and $ E_- $ denote the positive and negative energies, respectively.

png pdf
Figure 7-b:
Schematic representation of the mode collapse when using the loss function described in Eq. \eqrefeq:nae_loss_logcosh. The energy landscape is shown before (left) and after (right) the mode collapse. On the right, the reconstruction errors for the signal and background supports are completely overlapping on the vertical axis. The symbols $ E_+ $ and $ E_- $ denote the positive and negative energies, respectively.

png pdf
Figure 8:
Left: the AUC scores for an NAE trained on the $ \mathrm{t} \overline{\mathrm{t}} $ background and tested against a grid of possible SVJ signals, before the increase of the Wasserstein distance (at epoch 3000). Right: the AUC scores for the same NAE after the increase in Wasserstein distance (at epoch 10000).

png pdf
Figure 8-a:
Left: the AUC scores for an NAE trained on the $ \mathrm{t} \overline{\mathrm{t}} $ background and tested against a grid of possible SVJ signals, before the increase of the Wasserstein distance (at epoch 3000). Right: the AUC scores for the same NAE after the increase in Wasserstein distance (at epoch 10000).

png pdf
Figure 8-b:
Left: the AUC scores for an NAE trained on the $ \mathrm{t} \overline{\mathrm{t}} $ background and tested against a grid of possible SVJ signals, before the increase of the Wasserstein distance (at epoch 3000). Right: the AUC scores for the same NAE after the increase in Wasserstein distance (at epoch 10000).

png pdf
Figure 9:
Flowchart of the Wasserstein normalized autoencoder training. The negative examples are generated via MCMC using the energy function of the model, which is the Boltzmann-distributed reconstruction error. The energy function is computed from random input feature values $ X_n $ and the corresponding reconstructed feature values $ \widetilde{X}_n $, obtained by passing the inputs through the autoencoder. The positive examples are compared to the negative examples through the Wasserstein distance. The gradients are backpropagated through the entire MCMC chain.

png pdf
Figure 10:
Left: the Wasserstein distance between pairs of the positive, negative, and signal samples during the WNAE training. Right: the Wasserstein distance between negative and positive samples and the AUC scores from the same WNAE for several signal hypotheses with fixed $ m_{\Phi} = $ 2000 GeV and varying $ r_{\text{inv}} $.

png pdf
Figure 10-a:
Left: the Wasserstein distance between pairs of the positive, negative, and signal samples during the WNAE training. Right: the Wasserstein distance between negative and positive samples and the AUC scores from the same WNAE for several signal hypotheses with fixed $ m_{\Phi} = $ 2000 GeV and varying $ r_{\text{inv}} $.

png pdf
Figure 10-b:
Left: the Wasserstein distance between pairs of the positive, negative, and signal samples during the WNAE training. Right: the Wasserstein distance between negative and positive samples and the AUC scores from the same WNAE for several signal hypotheses with fixed $ m_{\Phi} = $ 2000 GeV and varying $ r_{\text{inv}} $.

png pdf
Figure 11:
The AUC scores for a WNAE trained on the $ \mathrm{t} \overline{\mathrm{t}} $ background and tested on a grid of possible SVJ signal models.

png pdf
Figure 12:
The distributions of half of the input variables, $ \tau_{2} $, $ \tau_{3} $, EFP1, and $ C_2^{(0.5)} $, for the positive, negative, and signal samples, at the start (upper) and at the end (lower) of the WNAE training. The signal distributions are overlaid for illustration; signal samples are not used during the training. All distributions are normalized such that their integral is 100.

png pdf
Figure 12-a:
The distributions of half of the input variables, $ \tau_{2} $, $ \tau_{3} $, EFP1, and $ C_2^{(0.5)} $, for the positive, negative, and signal samples, at the start (upper) and at the end (lower) of the WNAE training. The signal distributions are overlaid for illustration; signal samples are not used during the training. All distributions are normalized such that their integral is 100.

png pdf
Figure 12-b:
The distributions of half of the input variables, $ \tau_{2} $, $ \tau_{3} $, EFP1, and $ C_2^{(0.5)} $, for the positive, negative, and signal samples, at the start (upper) and at the end (lower) of the WNAE training. The signal distributions are overlaid for illustration; signal samples are not used during the training. All distributions are normalized such that their integral is 100.

png pdf
Figure 12-c:
The distributions of half of the input variables, $ \tau_{2} $, $ \tau_{3} $, EFP1, and $ C_2^{(0.5)} $, for the positive, negative, and signal samples, at the start (upper) and at the end (lower) of the WNAE training. The signal distributions are overlaid for illustration; signal samples are not used during the training. All distributions are normalized such that their integral is 100.

png pdf
Figure 12-d:
The distributions of half of the input variables, $ \tau_{2} $, $ \tau_{3} $, EFP1, and $ C_2^{(0.5)} $, for the positive, negative, and signal samples, at the start (upper) and at the end (lower) of the WNAE training. The signal distributions are overlaid for illustration; signal samples are not used during the training. All distributions are normalized such that their integral is 100.

png pdf
Figure 12-e:
The distributions of half of the input variables, $ \tau_{2} $, $ \tau_{3} $, EFP1, and $ C_2^{(0.5)} $, for the positive, negative, and signal samples, at the start (upper) and at the end (lower) of the WNAE training. The signal distributions are overlaid for illustration; signal samples are not used during the training. All distributions are normalized such that their integral is 100.

png pdf
Figure 12-f:
The distributions of half of the input variables, $ \tau_{2} $, $ \tau_{3} $, EFP1, and $ C_2^{(0.5)} $, for the positive, negative, and signal samples, at the start (upper) and at the end (lower) of the WNAE training. The signal distributions are overlaid for illustration; signal samples are not used during the training. All distributions are normalized such that their integral is 100.

png pdf
Figure 12-g:
The distributions of half of the input variables, $ \tau_{2} $, $ \tau_{3} $, EFP1, and $ C_2^{(0.5)} $, for the positive, negative, and signal samples, at the start (upper) and at the end (lower) of the WNAE training. The signal distributions are overlaid for illustration; signal samples are not used during the training. All distributions are normalized such that their integral is 100.

png pdf
Figure 12-h:
The distributions of half of the input variables, $ \tau_{2} $, $ \tau_{3} $, EFP1, and $ C_2^{(0.5)} $, for the positive, negative, and signal samples, at the start (upper) and at the end (lower) of the WNAE training. The signal distributions are overlaid for illustration; signal samples are not used during the training. All distributions are normalized such that their integral is 100.

png pdf
Figure 12-i:
The distributions of half of the input variables, $ \tau_{2} $, $ \tau_{3} $, EFP1, and $ C_2^{(0.5)} $, for the positive, negative, and signal samples, at the start (upper) and at the end (lower) of the WNAE training. The signal distributions are overlaid for illustration; signal samples are not used during the training. All distributions are normalized such that their integral is 100.

png pdf
Figure 12-j:
The distributions of half of the input variables, $ \tau_{2} $, $ \tau_{3} $, EFP1, and $ C_2^{(0.5)} $, for the positive, negative, and signal samples, at the start (upper) and at the end (lower) of the WNAE training. The signal distributions are overlaid for illustration; signal samples are not used during the training. All distributions are normalized such that their integral is 100.

png pdf
Figure 12-k:
The distributions of half of the input variables, $ \tau_{2} $, $ \tau_{3} $, EFP1, and $ C_2^{(0.5)} $, for the positive, negative, and signal samples, at the start (upper) and at the end (lower) of the WNAE training. The signal distributions are overlaid for illustration; signal samples are not used during the training. All distributions are normalized such that their integral is 100.

png pdf
Figure 12-l:
The distributions of half of the input variables, $ \tau_{2} $, $ \tau_{3} $, EFP1, and $ C_2^{(0.5)} $, for the positive, negative, and signal samples, at the start (upper) and at the end (lower) of the WNAE training. The signal distributions are overlaid for illustration; signal samples are not used during the training. All distributions are normalized such that their integral is 100.

png pdf
Figure 12-m:
The distributions of half of the input variables, $ \tau_{2} $, $ \tau_{3} $, EFP1, and $ C_2^{(0.5)} $, for the positive, negative, and signal samples, at the start (upper) and at the end (lower) of the WNAE training. The signal distributions are overlaid for illustration; signal samples are not used during the training. All distributions are normalized such that their integral is 100.

png pdf
Figure 12-n:
The distributions of half of the input variables, $ \tau_{2} $, $ \tau_{3} $, EFP1, and $ C_2^{(0.5)} $, for the positive, negative, and signal samples, at the start (upper) and at the end (lower) of the WNAE training. The signal distributions are overlaid for illustration; signal samples are not used during the training. All distributions are normalized such that their integral is 100.

png pdf
Figure 12-o:
The distributions of half of the input variables, $ \tau_{2} $, $ \tau_{3} $, EFP1, and $ C_2^{(0.5)} $, for the positive, negative, and signal samples, at the start (upper) and at the end (lower) of the WNAE training. The signal distributions are overlaid for illustration; signal samples are not used during the training. All distributions are normalized such that their integral is 100.

png pdf
Figure 12-p:
The distributions of half of the input variables, $ \tau_{2} $, $ \tau_{3} $, EFP1, and $ C_2^{(0.5)} $, for the positive, negative, and signal samples, at the start (upper) and at the end (lower) of the WNAE training. The signal distributions are overlaid for illustration; signal samples are not used during the training. All distributions are normalized such that their integral is 100.

png pdf
Figure 13:
The Wasserstein distance between the positive and negative samples and the AUC score during the training of a WNAE on an SVJ signal ($ m_{\Phi} = $ 2000 GeV, $ r_{\text{inv}} = $ 0.3), with the $ \mathrm{t} \overline{\mathrm{t}} $ background used for testing.

png pdf
Figure B1:
The learning rate during the training of the WNAE from Section 4.3.
Tables

png pdf
Table B1:
The hyperparameters of the learning rate scheduler.

png pdf
Table C1:
The hyperparameters of the MCMC.
Summary
Anomaly detection using autoencoders (AEs) relies on learning a reconstruction function that gives high reconstruction error to phase space regions with low probability density, such that they can be identified as anomalous. However, standard AEs are prone to learn to reconstruct outliers because they are free to minimize the reconstruction error outside the training phase space. In addition, they may exhibit complexity bias, learning to identify examples as anomalous only if their feature distribution is more complex than the training data. The normalized autoencoder (NAE) paradigm promotes the AE reconstruction error to an energy function in the framework of energy-based models, in order to define a normalized probabilistic model. This is achieved by minimizing the negative log-likelihood of the training data from the energy-based model probability. In practice, this method presents a number of failure modes, such as divergence of the loss function and phase space degeneracy, leading to low reconstruction error for phase space regions distinct from the training data. The Wasserstein normalized autoencoder (WNAE ), an improvement over the NAE, is introduced to solve these failure modes. This is achieved by directly minimizing the Wasserstein distance between the probability distribution of the training data and the Boltzmann distribution of the energy function of the model. This Wasserstein distance is found to be highly correlated with signal identification performance while still being fully signal-agnostic, preserving the unsupervised nature of the approach. The performance is studied in the context of a search for new physics with the CMS experiment, using top-antitop quark production as the standard model background and nonresonant semivisible jet production from a strongly coupled dark sector as the proposed signal model. The classification of the signal events as outliers by the WNAE is shown to be on par with or better than that of the NAE. Further, the WNAE approach is found to mitigate complexity bias, as it can effectively identify top quark jets as anomalous when trained on semivisible jet signal events. Though simulated samples were used to develop the WNAE, in practice it may be preferable to use observed data directly for training, in order to limit biases arising from differences between simulation and observation. In this case, the training data may contain anomalies, which would reduce the anomaly detection performance of the WNAE. The WNAE can be straightforwardly trained using observed data from a control region with no anomalous examples, if such a region can be defined and follows the same probability distribution as the observed data where the WNAE will be applied. When no assumption at all can be made about the nature of the anomalies, such as the case of triggering at a high-energy physics experiment, alternative solutions may exist. The WNAE associates low probability density regions with high reconstruction error; because anomalies necessarily have low probability density, they still tend to have relatively high reconstruction error even when included in the training data set. Therefore, the training data set can be iteratively refined by selecting a given fraction of examples with the lowest reconstruction error, in order to reduce the proportion of anomalous data. This would result in a self-supervised training for the WNAE. We leave the development of such a procedure for future work.
References
1 T. Heimel, G. Kasieczka, T. Plehn, and J. M. Thompson QCD or what? SciPost Phys. 6 (2019) 030 1808.08979
2 M. Farina, Y. Nakai, and D. Shih Searching for new physics with deep autoencoders PRD 101 (2020) 075021 1808.08992
3 T. Finke et al. Autoencoders for unsupervised anomaly detection in high energy physics JHEP 06 (2021) 161 2104.09051
4 S. Yoon, Y.-K. Noh, and F. Park Autoencoding under normalization constraints in Proceedings of the 38th International Conference on Machine Learning, p. 12087. 2021
link
2105.05735
5 T. Cohen, M. Lisanti, and H. K. Lou Semivisible jets: Dark matter undercover at the LHC PRL 115 (2015) 171804 1503.00009
6 CMS Collaboration The CMS experiment at the CERN LHC JINST 3 (2008) S08004
7 CMS Collaboration Development of the CMS detector for the CERN LHC Run 3 JINST 19 (2024) P05064 CMS-PRF-21-001
2309.05466
8 CMS Collaboration Performance of the CMS Level-1 trigger in proton-proton collisions at $ \sqrt{s} = $ 13 TeV JINST 15 (2020) P10017 CMS-TRG-17-001
2006.10165
9 CMS Collaboration The CMS trigger system JINST 12 (2017) P01020 CMS-TRG-12-001
1609.02366
10 CMS Collaboration Performance of the CMS high-level trigger during LHC run 2 JINST 19 (2024) P11021 CMS-TRG-19-001
2410.17038
11 CMS Collaboration Electron and photon reconstruction and identification with the CMS experiment at the CERN LHC JINST 16 (2021) P05014 CMS-EGM-17-001
2012.06888
12 CMS Collaboration Performance of the CMS muon detector and muon reconstruction with proton-proton collisions at $ \sqrt{s}= $ 13 TeV JINST 13 (2018) P06015 CMS-MUO-16-001
1804.04528
13 CMS Collaboration Description and performance of track and primary-vertex reconstruction with the CMS tracker JINST 9 (2014) P10009 CMS-TRK-11-001
1405.6569
14 CMS Collaboration Particle-flow reconstruction and global event description with the CMS detector JINST 12 (2017) P10003 CMS-PRF-14-001
1706.04965
15 CMS Collaboration Jet energy scale and resolution in the CMS experiment in pp collisions at 8 TeV JINST 12 (2017) P02014 CMS-JME-13-004
1607.03663
16 CMS Collaboration Performance of missing transverse momentum reconstruction in proton-proton collisions at $ \sqrt{s} = $ 13 TeV using the CMS detector JINST 14 (2019) P07004 CMS-JME-17-001
1903.06078
17 L. V. Kantorovich Mathematical methods of organizing and planning production Management Science 6 (1939) 366
18 L. N. Vaserstein Markov processes over denumerable products of spaces describing large systems of automata Problems of Information Transmission 5 (1969) 47
19 M. A. Kramer Autoassociative neural networks Comput. Chem. Eng. 16 (1992) 313
20 P. Smolensky Parallel Distributed Processing, Volume 1: Explorations in the Microstructure of Cognition: Foundations ch. Information processing in dynamical systems: Foundations of harmony theory. The MIT Press, 1986
link
21 G. Hinton Training products of experts by minimizing contrastive divergence Neural Comput. 14 (2002) 1771
22 Y. W. Teh, M. Welling, S. Osindero, and G. E. Hinton Energy-based models for sparse overcomplete representations J. Mach. Learn. Res. 4 (2003) 1235
23 E. T. Jaynes Information theory and statistical mechanics PR 106 (1957) 620
24 D. P. Kingma and M. Welling Auto-encoding variational Bayes in 2nd International Conference on Learning Representations. 2014
link
1312.6114
25 A. Makhzani, J. Shlens, N. Jaitly, and I. Goodfellow Adversarial autoencoders in International Conference on Learning Representations. 2016
link
1511.05644
26 I. J. Goodfellow et al. Generative adversarial nets in Advances in Neural Information Processing Systems, volume 27, Curran Associates, Inc, 2014
link
1406.2661
27 I. Tolstikhin, O. Bousquet, S. Gelly, and B. Schoelkopf Wasserstein auto-encoders in International Conference on Learning Representations. 2018 1711.01558
28 L. V. Kantorovich and S. Rubinshtein On a space of totally additive functions Vestnik of the St. Petersburg University:, 1958
Mathematics 13 (1958) 52
29 M. Arjovsky, S. Chintala, and L. Bottou Wasserstein GAN in Proceedings of the 34th International Conference on Machine Learning, volume 70, p. 214. 2017
link
1701.07875
30 R. Flamary et al. POT: Python optimal transport J. Mach. Learn. Res. 22 (2021) 1
31 K. Fatras et al. Learning with minibatch Wasserstein: asymptotic and gradient properties in Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, volume 108, 2019
link
1910.04091
32 E. Nalisnick et al. Do deep generative models know what they don't know? in International Conference on Learning Representations. 2019 1810.09136
33 Z. Xiao, Q. Yan, and Y. Amit Likelihood regret: An out-of-distribution detection score for variational auto-encoder in Advances in Neural Information Processing Systems, volume 33, p. 20685. 2020 2003.02977
34 S. Pidhorskyi, R. Almohsen, and G. Doretto Generative probabilistic novelty detection with adversarial autoencoders in Proceedings of the 32nd International Conference on Neural Information Processing Systems, p. 6823. 2018
link
1807.02588
35 S. Akcay, A. Atapour-Abarghouei, and T. P. Breckon GANomaly: Semi-supervised anomaly detection via adversarial training in Asian Conference on Computer Vision, p. 622, Springer. 2018
link
1805.06725
36 T. Schlegl et al. f-AnoGAN: Fast unsupervised anomaly detection with generative adversarial networks Medical Image Analysis 54 (2019) 30
37 Y. Du and I. Mordatch Implicit generation and modeling with energy-based models in Proceedings of the 33rd International Conference on Neural Information Processing Systems, p. 324. 2019
link
1903.08689
38 A. Gandrakota Realtime anomaly detection at the L1 trigger of CMS experiment PoS ICHEP, 2025
link
2411.19506
39 CMS Collaboration Model-agnostic search for dijet resonances with anomalous jet substructure in proton-proton collisions at $ \sqrt{s} = $ 13 TeV Rept. Prog. Phys. 88 (2025) 067802 CMS-EXO-22-026
2412.03747
40 ATLAS Collaboration Search for new phenomena in two-body invariant mass distributions using unsupervised machine learning for anomaly detection at $ \sqrt{s}= $ 13 TeV with the ATLAS detector PRL 132 (2024) 081801 2307.01612
41 B. M. Dillon et al. A normalized autoencoder for LHC triggers SciPost Phys. Core 6 (2023) 074 2206.14225
42 V. C. Rubin, N. Thonnard, and W. K. Ford, Jr. Rotational properties of 21 SC galaxies with a large range of luminosities and radii, from NGC 4605 (R = 4 kpc) to UGC 2885 (R = 122 kpc) Astrophys. J. 238 (1980) 471
43 M. Persic, P. Salucci, and F. Stel The universal rotation curve of spiral galaxies: I. The dark matter connection Mon. Not. Roy. Astron. Soc. 281 (1996) 27 astro-ph/9506004
44 D. Clowe et al. A direct empirical proof of the existence of dark matter Astrophys. J. 648 (2006) L109 astro-ph/0608407
45 DES Collaboration Dark Energy Survey year 1 results: curved-sky weak lensing mass map Mon. Not. Roy. Astron. Soc. 475 (2018) 3165 1708.01535
46 Planck Collaboration Planck 2018 results. VI. Cosmological parameters Astron. Astrophys. 641 (2020) A6 1807.06209
47 M. J. Strassler and K. M. Zurek Echoes of a hidden valley at hadron colliders PLB 651 (2007) 374 hep-ph/0604261
48 CMS Collaboration Search for resonant production of strongly coupled dark matter in proton-proton collisions at 13 TeV JHEP 06 (2022) 156 CMS-EXO-19-020
2112.11125
49 ATLAS Collaboration Search for new physics in final states with semivisible jets or anomalous signatures using the ATLAS detector PRD 112 (2025) 012021 2505.01634
50 T. Cohen, M. Lisanti, H. K. Lou, and S. Mishra-Sharma LHC searches for dark sector showers JHEP 11 (2017) 196 1707.05326
51 E. Bernreuther, F. Kahlhoefer, M. Krämer, and P. Tunney Strongly interacting dark sectors in the early universe and at the LHC through a simplified portal JHEP 01 (2020) 162 1907.04346
52 J. Alwall et al. The automated computation of tree-level and next-to-leading order differential cross sections, and their matching to parton shower simulations JHEP 07 (2014) 079 1405.0301
53 T. Sjostrand et al. An introduction to PYTHIA 8.2 Comput. Phys. Commun. 191 (2015) 159 1410.3012
54 GEANT4 Collaboration GEANT4---a simulation toolkit NIM A 506 (2003) 250
55 NNPDF Collaboration Parton distributions from high-precision collider data EPJC 77 (2017) 663 1706.00428
56 M. Cacciari, G. P. Salam, and G. Soyez The anti-$ k_{\mathrm{T}} $ jet clustering algorithm JHEP 04 (2008) 063 0802.1189
57 M. Cacciari, G. P. Salam, and G. Soyez FastJet user manual EPJC 72 (2012) 1896 1111.6097
58 CMS Collaboration Performance of quark/gluon discrimination in 8 TeV pp data CMS Physics Analysis Summary, 2013
CMS-PAS-JME-13-002
CMS-PAS-JME-13-002
59 P. T. Komiske, E. M. Metodiev, and J. Thaler Energy flow polynomials: A complete linear basis for jet substructure JHEP 04 (2018) 013 1712.07124
60 A. J. Larkoski, G. P. Salam, and J. Thaler Energy correlation functions for jet substructure JHEP 06 (2013) 108 1305.0007
61 A. J. Larkoski, S. Marzani, G. Soyez, and J. Thaler Soft drop JHEP 05 (2014) 146 1402.2657
62 J. Thaler and K. Van Tilburg Identifying boosted objects with N-subjettiness JHEP 03 (2011) 015 1011.2268
63 F. Pedregosa et al. Scikit-learn: Machine learning in Python J. Mach. Learn. Res. 12 (2011) 2825 1201.0490
64 F. Canelli et al. Autoencoders for semivisible jet detection JHEP 02 (2022) 074 2112.02864
65 CMS Collaboration Source code repository gitlab
66 A. Paszke et al. PyTorch: An imperative style, high-performance deep learning library in Proceedings of the 33rd International Conference on Neural Information Processing Systems, volume 32, p. 721. 2019
link
1912.01703
67 T. Tieleman Training restricted Boltzmann machines using approximations to the likelihood gradient in Proceedings of the 25th International Conference on Machine Learning, p. 1064. 2008
link
Compact Muon Solenoid
LHC, CERN