| CMS-MLG-24-002 ; CERN-EP-2025-209 | ||
| Wasserstein normalized autoencoder for anomaly detection | ||
| CMS Collaboration | ||
| 3 October 2025 | ||
| Submitted to Machine Learning: Science and Technology | ||
| Abstract: A novel anomaly detection algorithm is presented. The Wasserstein normalized autoencoder (WNAE) is a normalized probabilistic model that minimizes the Wasserstein distance between the learned probability distribution---a Boltzmann distribution where the energy is the reconstruction error of the autoencoder---and the distribution of the training data. This algorithm has been developed and applied to the identification of semivisible jets---conical sprays of visible standard model particles and invisible dark matter states---with the CMS experiment at the CERN LHC. Trained on jets of particles from simulated standard model processes, the WNAE is shown to learn the probability distribution of the input data in a fully unsupervised fashion, such that it effectively identifies new physics jets as anomalies. The model consistently demonstrates stable, convergent training and achieves strong classification performance across a wide range of signals, improving upon standard normalized autoencoders, while remaining agnostic to the signal. The WNAE directly tackles the problem of outlier reconstruction, a common failure mode of autoencoders in anomaly detection tasks. | ||
| Links: e-print arXiv:2510.02168 [hep-ex] (PDF) ; CDS record ; inSPIRE record ; CADI line (restricted) ; | ||
| Figures | |
|
png pdf |
Figure 1:
Schematic visualization of the outlier reconstruction failure mode. Signal samples drawn from the hatched area are reconstructed well by the AE, despite not being part of the training set, and thus are not separated from the background. The AE training is assumed to have converged such that the background is reconstructed well. |
|
png pdf |
Figure 2:
An illustration of collider SVJ production. The dashed black arrows indicate stable, undetectable DM candidate particles. Figure adapted from Ref. [51]. |
|
png pdf |
Figure 3:
Left: the reconstruction error (upper panel) and the AUC scores (lower panel) for the AE trained on the $ \mathrm{t} \overline{\mathrm{t}} $ background, evaluated during each training epoch on $ \mathrm{t} \overline{\mathrm{t}} $ background jets and signal models with $ m_{\Phi} = $ 2000 GeV and $ r_{\text{inv}} = $ 0.3 (upper) or $ r_{\text{inv}} = 0.1, 0.3, 0.5, $ 0.7 (lower). Right: The AUC scores for the same AE, evaluated for the epoch with the minimal background reconstruction error, for the classification of several SVJ signal hypotheses against the $ \mathrm{t} \overline{\mathrm{t}} $ background. The AUC scores are close to 0.5, indicating that the AE is unable to discriminate between the SVJ signal and the $ \mathrm{t} \overline{\mathrm{t}} $ background. |
|
png pdf |
Figure 3-a:
Left: the reconstruction error (upper panel) and the AUC scores (lower panel) for the AE trained on the $ \mathrm{t} \overline{\mathrm{t}} $ background, evaluated during each training epoch on $ \mathrm{t} \overline{\mathrm{t}} $ background jets and signal models with $ m_{\Phi} = $ 2000 GeV and $ r_{\text{inv}} = $ 0.3 (upper) or $ r_{\text{inv}} = 0.1, 0.3, 0.5, $ 0.7 (lower). Right: The AUC scores for the same AE, evaluated for the epoch with the minimal background reconstruction error, for the classification of several SVJ signal hypotheses against the $ \mathrm{t} \overline{\mathrm{t}} $ background. The AUC scores are close to 0.5, indicating that the AE is unable to discriminate between the SVJ signal and the $ \mathrm{t} \overline{\mathrm{t}} $ background. |
|
png pdf |
Figure 3-b:
Left: the reconstruction error (upper panel) and the AUC scores (lower panel) for the AE trained on the $ \mathrm{t} \overline{\mathrm{t}} $ background, evaluated during each training epoch on $ \mathrm{t} \overline{\mathrm{t}} $ background jets and signal models with $ m_{\Phi} = $ 2000 GeV and $ r_{\text{inv}} = $ 0.3 (upper) or $ r_{\text{inv}} = 0.1, 0.3, 0.5, $ 0.7 (lower). Right: The AUC scores for the same AE, evaluated for the epoch with the minimal background reconstruction error, for the classification of several SVJ signal hypotheses against the $ \mathrm{t} \overline{\mathrm{t}} $ background. The AUC scores are close to 0.5, indicating that the AE is unable to discriminate between the SVJ signal and the $ \mathrm{t} \overline{\mathrm{t}} $ background. |
|
png pdf |
Figure 4:
Left: NAE training showing the divergence of the loss function, in terms of positive and negative energy (upper panel), and the AUC for several signal hypotheses with fixed $ m_{\Phi} = $ 2000 GeV and varying $ r_{\text{inv}} $ values (lower panel). Right: the positive and negative energies from the upper panel of the left plot, shown for $ \text{epoch} < $ 250---before the divergence---and 0.18 $ < \text{energy} < $ 1.4 on a linear scale, to illustrate their differences. |
|
png pdf |
Figure 4-a:
Left: NAE training showing the divergence of the loss function, in terms of positive and negative energy (upper panel), and the AUC for several signal hypotheses with fixed $ m_{\Phi} = $ 2000 GeV and varying $ r_{\text{inv}} $ values (lower panel). Right: the positive and negative energies from the upper panel of the left plot, shown for $ \text{epoch} < $ 250---before the divergence---and 0.18 $ < \text{energy} < $ 1.4 on a linear scale, to illustrate their differences. |
|
png pdf |
Figure 4-b:
Left: NAE training showing the divergence of the loss function, in terms of positive and negative energy (upper panel), and the AUC for several signal hypotheses with fixed $ m_{\Phi} = $ 2000 GeV and varying $ r_{\text{inv}} $ values (lower panel). Right: the positive and negative energies from the upper panel of the left plot, shown for $ \text{epoch} < $ 250---before the divergence---and 0.18 $ < \text{energy} < $ 1.4 on a linear scale, to illustrate their differences. |
|
png pdf |
Figure 5:
Distributions of the input feature $ \tau_{3} $ for positive, negative, and signal samples, before (epochs 274) and after (epochs 275--279) the start of the divergence of the NAE loss. The signal distributions are overlaid for illustration; signal samples are not used during the training. All distributions are normalized such that their integral is 100. |
|
png pdf |
Figure 5-a:
Distributions of the input feature $ \tau_{3} $ for positive, negative, and signal samples, before (epochs 274) and after (epochs 275--279) the start of the divergence of the NAE loss. The signal distributions are overlaid for illustration; signal samples are not used during the training. All distributions are normalized such that their integral is 100. |
|
png pdf |
Figure 5-b:
Distributions of the input feature $ \tau_{3} $ for positive, negative, and signal samples, before (epochs 274) and after (epochs 275--279) the start of the divergence of the NAE loss. The signal distributions are overlaid for illustration; signal samples are not used during the training. All distributions are normalized such that their integral is 100. |
|
png pdf |
Figure 5-c:
Distributions of the input feature $ \tau_{3} $ for positive, negative, and signal samples, before (epochs 274) and after (epochs 275--279) the start of the divergence of the NAE loss. The signal distributions are overlaid for illustration; signal samples are not used during the training. All distributions are normalized such that their integral is 100. |
|
png pdf |
Figure 5-d:
Distributions of the input feature $ \tau_{3} $ for positive, negative, and signal samples, before (epochs 274) and after (epochs 275--279) the start of the divergence of the NAE loss. The signal distributions are overlaid for illustration; signal samples are not used during the training. All distributions are normalized such that their integral is 100. |
|
png pdf |
Figure 5-e:
Distributions of the input feature $ \tau_{3} $ for positive, negative, and signal samples, before (epochs 274) and after (epochs 275--279) the start of the divergence of the NAE loss. The signal distributions are overlaid for illustration; signal samples are not used during the training. All distributions are normalized such that their integral is 100. |
|
png pdf |
Figure 5-f:
Distributions of the input feature $ \tau_{3} $ for positive, negative, and signal samples, before (epochs 274) and after (epochs 275--279) the start of the divergence of the NAE loss. The signal distributions are overlaid for illustration; signal samples are not used during the training. All distributions are normalized such that their integral is 100. |
|
png pdf |
Figure 6:
As a function of epoch during the training of an NAE with the loss function from Eq. \eqrefeq:nae_loss_logcosh: the positive and negative energies and the value of the loss function (upper panel); the Wasserstein distance between negative and positive samples and the AUC for several signal hypotheses with fixed $ m_{\Phi} = $ 2000 GeV and varying $ r_{\text{inv}} $ (lower panel). |
|
png pdf |
Figure 7:
Schematic representation of the mode collapse when using the loss function described in Eq. \eqrefeq:nae_loss_logcosh. The energy landscape is shown before (left) and after (right) the mode collapse. On the right, the reconstruction errors for the signal and background supports are completely overlapping on the vertical axis. The symbols $ E_+ $ and $ E_- $ denote the positive and negative energies, respectively. |
|
png pdf |
Figure 7-a:
Schematic representation of the mode collapse when using the loss function described in Eq. \eqrefeq:nae_loss_logcosh. The energy landscape is shown before (left) and after (right) the mode collapse. On the right, the reconstruction errors for the signal and background supports are completely overlapping on the vertical axis. The symbols $ E_+ $ and $ E_- $ denote the positive and negative energies, respectively. |
|
png pdf |
Figure 7-b:
Schematic representation of the mode collapse when using the loss function described in Eq. \eqrefeq:nae_loss_logcosh. The energy landscape is shown before (left) and after (right) the mode collapse. On the right, the reconstruction errors for the signal and background supports are completely overlapping on the vertical axis. The symbols $ E_+ $ and $ E_- $ denote the positive and negative energies, respectively. |
|
png pdf |
Figure 8:
Left: the AUC scores for an NAE trained on the $ \mathrm{t} \overline{\mathrm{t}} $ background and tested against a grid of possible SVJ signals, before the increase of the Wasserstein distance (at epoch 3000). Right: the AUC scores for the same NAE after the increase in Wasserstein distance (at epoch 10000). |
|
png pdf |
Figure 8-a:
Left: the AUC scores for an NAE trained on the $ \mathrm{t} \overline{\mathrm{t}} $ background and tested against a grid of possible SVJ signals, before the increase of the Wasserstein distance (at epoch 3000). Right: the AUC scores for the same NAE after the increase in Wasserstein distance (at epoch 10000). |
|
png pdf |
Figure 8-b:
Left: the AUC scores for an NAE trained on the $ \mathrm{t} \overline{\mathrm{t}} $ background and tested against a grid of possible SVJ signals, before the increase of the Wasserstein distance (at epoch 3000). Right: the AUC scores for the same NAE after the increase in Wasserstein distance (at epoch 10000). |
|
png pdf |
Figure 9:
Flowchart of the Wasserstein normalized autoencoder training. The negative examples are generated via MCMC using the energy function of the model, which is the Boltzmann-distributed reconstruction error. The energy function is computed from random input feature values $ X_n $ and the corresponding reconstructed feature values $ \widetilde{X}_n $, obtained by passing the inputs through the autoencoder. The positive examples are compared to the negative examples through the Wasserstein distance. The gradients are backpropagated through the entire MCMC chain. |
|
png pdf |
Figure 10:
Left: the Wasserstein distance between pairs of the positive, negative, and signal samples during the WNAE training. Right: the Wasserstein distance between negative and positive samples and the AUC scores from the same WNAE for several signal hypotheses with fixed $ m_{\Phi} = $ 2000 GeV and varying $ r_{\text{inv}} $. |
|
png pdf |
Figure 10-a:
Left: the Wasserstein distance between pairs of the positive, negative, and signal samples during the WNAE training. Right: the Wasserstein distance between negative and positive samples and the AUC scores from the same WNAE for several signal hypotheses with fixed $ m_{\Phi} = $ 2000 GeV and varying $ r_{\text{inv}} $. |
|
png pdf |
Figure 10-b:
Left: the Wasserstein distance between pairs of the positive, negative, and signal samples during the WNAE training. Right: the Wasserstein distance between negative and positive samples and the AUC scores from the same WNAE for several signal hypotheses with fixed $ m_{\Phi} = $ 2000 GeV and varying $ r_{\text{inv}} $. |
|
png pdf |
Figure 11:
The AUC scores for a WNAE trained on the $ \mathrm{t} \overline{\mathrm{t}} $ background and tested on a grid of possible SVJ signal models. |
|
png pdf |
Figure 12:
The distributions of half of the input variables, $ \tau_{2} $, $ \tau_{3} $, EFP1, and $ C_2^{(0.5)} $, for the positive, negative, and signal samples, at the start (upper) and at the end (lower) of the WNAE training. The signal distributions are overlaid for illustration; signal samples are not used during the training. All distributions are normalized such that their integral is 100. |
|
png pdf |
Figure 12-a:
The distributions of half of the input variables, $ \tau_{2} $, $ \tau_{3} $, EFP1, and $ C_2^{(0.5)} $, for the positive, negative, and signal samples, at the start (upper) and at the end (lower) of the WNAE training. The signal distributions are overlaid for illustration; signal samples are not used during the training. All distributions are normalized such that their integral is 100. |
|
png pdf |
Figure 12-b:
The distributions of half of the input variables, $ \tau_{2} $, $ \tau_{3} $, EFP1, and $ C_2^{(0.5)} $, for the positive, negative, and signal samples, at the start (upper) and at the end (lower) of the WNAE training. The signal distributions are overlaid for illustration; signal samples are not used during the training. All distributions are normalized such that their integral is 100. |
|
png pdf |
Figure 12-c:
The distributions of half of the input variables, $ \tau_{2} $, $ \tau_{3} $, EFP1, and $ C_2^{(0.5)} $, for the positive, negative, and signal samples, at the start (upper) and at the end (lower) of the WNAE training. The signal distributions are overlaid for illustration; signal samples are not used during the training. All distributions are normalized such that their integral is 100. |
|
png pdf |
Figure 12-d:
The distributions of half of the input variables, $ \tau_{2} $, $ \tau_{3} $, EFP1, and $ C_2^{(0.5)} $, for the positive, negative, and signal samples, at the start (upper) and at the end (lower) of the WNAE training. The signal distributions are overlaid for illustration; signal samples are not used during the training. All distributions are normalized such that their integral is 100. |
|
png pdf |
Figure 12-e:
The distributions of half of the input variables, $ \tau_{2} $, $ \tau_{3} $, EFP1, and $ C_2^{(0.5)} $, for the positive, negative, and signal samples, at the start (upper) and at the end (lower) of the WNAE training. The signal distributions are overlaid for illustration; signal samples are not used during the training. All distributions are normalized such that their integral is 100. |
|
png pdf |
Figure 12-f:
The distributions of half of the input variables, $ \tau_{2} $, $ \tau_{3} $, EFP1, and $ C_2^{(0.5)} $, for the positive, negative, and signal samples, at the start (upper) and at the end (lower) of the WNAE training. The signal distributions are overlaid for illustration; signal samples are not used during the training. All distributions are normalized such that their integral is 100. |
|
png pdf |
Figure 12-g:
The distributions of half of the input variables, $ \tau_{2} $, $ \tau_{3} $, EFP1, and $ C_2^{(0.5)} $, for the positive, negative, and signal samples, at the start (upper) and at the end (lower) of the WNAE training. The signal distributions are overlaid for illustration; signal samples are not used during the training. All distributions are normalized such that their integral is 100. |
|
png pdf |
Figure 12-h:
The distributions of half of the input variables, $ \tau_{2} $, $ \tau_{3} $, EFP1, and $ C_2^{(0.5)} $, for the positive, negative, and signal samples, at the start (upper) and at the end (lower) of the WNAE training. The signal distributions are overlaid for illustration; signal samples are not used during the training. All distributions are normalized such that their integral is 100. |
|
png pdf |
Figure 12-i:
The distributions of half of the input variables, $ \tau_{2} $, $ \tau_{3} $, EFP1, and $ C_2^{(0.5)} $, for the positive, negative, and signal samples, at the start (upper) and at the end (lower) of the WNAE training. The signal distributions are overlaid for illustration; signal samples are not used during the training. All distributions are normalized such that their integral is 100. |
|
png pdf |
Figure 12-j:
The distributions of half of the input variables, $ \tau_{2} $, $ \tau_{3} $, EFP1, and $ C_2^{(0.5)} $, for the positive, negative, and signal samples, at the start (upper) and at the end (lower) of the WNAE training. The signal distributions are overlaid for illustration; signal samples are not used during the training. All distributions are normalized such that their integral is 100. |
|
png pdf |
Figure 12-k:
The distributions of half of the input variables, $ \tau_{2} $, $ \tau_{3} $, EFP1, and $ C_2^{(0.5)} $, for the positive, negative, and signal samples, at the start (upper) and at the end (lower) of the WNAE training. The signal distributions are overlaid for illustration; signal samples are not used during the training. All distributions are normalized such that their integral is 100. |
|
png pdf |
Figure 12-l:
The distributions of half of the input variables, $ \tau_{2} $, $ \tau_{3} $, EFP1, and $ C_2^{(0.5)} $, for the positive, negative, and signal samples, at the start (upper) and at the end (lower) of the WNAE training. The signal distributions are overlaid for illustration; signal samples are not used during the training. All distributions are normalized such that their integral is 100. |
|
png pdf |
Figure 12-m:
The distributions of half of the input variables, $ \tau_{2} $, $ \tau_{3} $, EFP1, and $ C_2^{(0.5)} $, for the positive, negative, and signal samples, at the start (upper) and at the end (lower) of the WNAE training. The signal distributions are overlaid for illustration; signal samples are not used during the training. All distributions are normalized such that their integral is 100. |
|
png pdf |
Figure 12-n:
The distributions of half of the input variables, $ \tau_{2} $, $ \tau_{3} $, EFP1, and $ C_2^{(0.5)} $, for the positive, negative, and signal samples, at the start (upper) and at the end (lower) of the WNAE training. The signal distributions are overlaid for illustration; signal samples are not used during the training. All distributions are normalized such that their integral is 100. |
|
png pdf |
Figure 12-o:
The distributions of half of the input variables, $ \tau_{2} $, $ \tau_{3} $, EFP1, and $ C_2^{(0.5)} $, for the positive, negative, and signal samples, at the start (upper) and at the end (lower) of the WNAE training. The signal distributions are overlaid for illustration; signal samples are not used during the training. All distributions are normalized such that their integral is 100. |
|
png pdf |
Figure 12-p:
The distributions of half of the input variables, $ \tau_{2} $, $ \tau_{3} $, EFP1, and $ C_2^{(0.5)} $, for the positive, negative, and signal samples, at the start (upper) and at the end (lower) of the WNAE training. The signal distributions are overlaid for illustration; signal samples are not used during the training. All distributions are normalized such that their integral is 100. |
|
png pdf |
Figure 13:
The Wasserstein distance between the positive and negative samples and the AUC score during the training of a WNAE on an SVJ signal ($ m_{\Phi} = $ 2000 GeV, $ r_{\text{inv}} = $ 0.3), with the $ \mathrm{t} \overline{\mathrm{t}} $ background used for testing. |
|
png pdf |
Figure B1:
The learning rate during the training of the WNAE from Section 4.3. |
| Tables | |
|
png pdf |
Table B1:
The hyperparameters of the learning rate scheduler. |
|
png pdf |
Table C1:
The hyperparameters of the MCMC. |
| Summary |
| Anomaly detection using autoencoders (AEs) relies on learning a reconstruction function that gives high reconstruction error to phase space regions with low probability density, such that they can be identified as anomalous. However, standard AEs are prone to learn to reconstruct outliers because they are free to minimize the reconstruction error outside the training phase space. In addition, they may exhibit complexity bias, learning to identify examples as anomalous only if their feature distribution is more complex than the training data. The normalized autoencoder (NAE) paradigm promotes the AE reconstruction error to an energy function in the framework of energy-based models, in order to define a normalized probabilistic model. This is achieved by minimizing the negative log-likelihood of the training data from the energy-based model probability. In practice, this method presents a number of failure modes, such as divergence of the loss function and phase space degeneracy, leading to low reconstruction error for phase space regions distinct from the training data. The Wasserstein normalized autoencoder (WNAE ), an improvement over the NAE, is introduced to solve these failure modes. This is achieved by directly minimizing the Wasserstein distance between the probability distribution of the training data and the Boltzmann distribution of the energy function of the model. This Wasserstein distance is found to be highly correlated with signal identification performance while still being fully signal-agnostic, preserving the unsupervised nature of the approach. The performance is studied in the context of a search for new physics with the CMS experiment, using top-antitop quark production as the standard model background and nonresonant semivisible jet production from a strongly coupled dark sector as the proposed signal model. The classification of the signal events as outliers by the WNAE is shown to be on par with or better than that of the NAE. Further, the WNAE approach is found to mitigate complexity bias, as it can effectively identify top quark jets as anomalous when trained on semivisible jet signal events. Though simulated samples were used to develop the WNAE, in practice it may be preferable to use observed data directly for training, in order to limit biases arising from differences between simulation and observation. In this case, the training data may contain anomalies, which would reduce the anomaly detection performance of the WNAE. The WNAE can be straightforwardly trained using observed data from a control region with no anomalous examples, if such a region can be defined and follows the same probability distribution as the observed data where the WNAE will be applied. When no assumption at all can be made about the nature of the anomalies, such as the case of triggering at a high-energy physics experiment, alternative solutions may exist. The WNAE associates low probability density regions with high reconstruction error; because anomalies necessarily have low probability density, they still tend to have relatively high reconstruction error even when included in the training data set. Therefore, the training data set can be iteratively refined by selecting a given fraction of examples with the lowest reconstruction error, in order to reduce the proportion of anomalous data. This would result in a self-supervised training for the WNAE. We leave the development of such a procedure for future work. |
| References | ||||
| 1 | T. Heimel, G. Kasieczka, T. Plehn, and J. M. Thompson | QCD or what? | SciPost Phys. 6 (2019) 030 | 1808.08979 |
| 2 | M. Farina, Y. Nakai, and D. Shih | Searching for new physics with deep autoencoders | PRD 101 (2020) 075021 | 1808.08992 |
| 3 | T. Finke et al. | Autoencoders for unsupervised anomaly detection in high energy physics | JHEP 06 (2021) 161 | 2104.09051 |
| 4 | S. Yoon, Y.-K. Noh, and F. Park | Autoencoding under normalization constraints | in Proceedings of the 38th International Conference on Machine Learning, p. 12087. 2021 link |
2105.05735 |
| 5 | T. Cohen, M. Lisanti, and H. K. Lou | Semivisible jets: Dark matter undercover at the LHC | PRL 115 (2015) 171804 | 1503.00009 |
| 6 | CMS Collaboration | The CMS experiment at the CERN LHC | JINST 3 (2008) S08004 | |
| 7 | CMS Collaboration | Development of the CMS detector for the CERN LHC Run 3 | JINST 19 (2024) P05064 | CMS-PRF-21-001 2309.05466 |
| 8 | CMS Collaboration | Performance of the CMS Level-1 trigger in proton-proton collisions at $ \sqrt{s} = $ 13 TeV | JINST 15 (2020) P10017 | CMS-TRG-17-001 2006.10165 |
| 9 | CMS Collaboration | The CMS trigger system | JINST 12 (2017) P01020 | CMS-TRG-12-001 1609.02366 |
| 10 | CMS Collaboration | Performance of the CMS high-level trigger during LHC run 2 | JINST 19 (2024) P11021 | CMS-TRG-19-001 2410.17038 |
| 11 | CMS Collaboration | Electron and photon reconstruction and identification with the CMS experiment at the CERN LHC | JINST 16 (2021) P05014 | CMS-EGM-17-001 2012.06888 |
| 12 | CMS Collaboration | Performance of the CMS muon detector and muon reconstruction with proton-proton collisions at $ \sqrt{s}= $ 13 TeV | JINST 13 (2018) P06015 | CMS-MUO-16-001 1804.04528 |
| 13 | CMS Collaboration | Description and performance of track and primary-vertex reconstruction with the CMS tracker | JINST 9 (2014) P10009 | CMS-TRK-11-001 1405.6569 |
| 14 | CMS Collaboration | Particle-flow reconstruction and global event description with the CMS detector | JINST 12 (2017) P10003 | CMS-PRF-14-001 1706.04965 |
| 15 | CMS Collaboration | Jet energy scale and resolution in the CMS experiment in pp collisions at 8 TeV | JINST 12 (2017) P02014 | CMS-JME-13-004 1607.03663 |
| 16 | CMS Collaboration | Performance of missing transverse momentum reconstruction in proton-proton collisions at $ \sqrt{s} = $ 13 TeV using the CMS detector | JINST 14 (2019) P07004 | CMS-JME-17-001 1903.06078 |
| 17 | L. V. Kantorovich | Mathematical methods of organizing and planning production | Management Science 6 (1939) 366 | |
| 18 | L. N. Vaserstein | Markov processes over denumerable products of spaces describing large systems of automata | Problems of Information Transmission 5 (1969) 47 | |
| 19 | M. A. Kramer | Autoassociative neural networks | Comput. Chem. Eng. 16 (1992) 313 | |
| 20 | P. Smolensky | Parallel Distributed Processing, Volume 1: Explorations in the Microstructure of Cognition: Foundations | ch. Information processing in dynamical systems: Foundations of harmony theory. The MIT Press, 1986 link |
|
| 21 | G. Hinton | Training products of experts by minimizing contrastive divergence | Neural Comput. 14 (2002) 1771 | |
| 22 | Y. W. Teh, M. Welling, S. Osindero, and G. E. Hinton | Energy-based models for sparse overcomplete representations | J. Mach. Learn. Res. 4 (2003) 1235 | |
| 23 | E. T. Jaynes | Information theory and statistical mechanics | PR 106 (1957) 620 | |
| 24 | D. P. Kingma and M. Welling | Auto-encoding variational Bayes | in 2nd International Conference on Learning Representations. 2014 link |
1312.6114 |
| 25 | A. Makhzani, J. Shlens, N. Jaitly, and I. Goodfellow | Adversarial autoencoders | in International Conference on Learning Representations. 2016 link |
1511.05644 |
| 26 | I. J. Goodfellow et al. | Generative adversarial nets | in Advances in Neural Information Processing Systems, volume 27, Curran Associates, Inc, 2014 link |
1406.2661 |
| 27 | I. Tolstikhin, O. Bousquet, S. Gelly, and B. Schoelkopf | Wasserstein auto-encoders | in International Conference on Learning Representations. 2018 | 1711.01558 |
| 28 | L. V. Kantorovich and S. Rubinshtein | On a space of totally additive functions | Vestnik of the St. Petersburg University:, 1958 Mathematics 13 (1958) 52 |
|
| 29 | M. Arjovsky, S. Chintala, and L. Bottou | Wasserstein GAN | in Proceedings of the 34th International Conference on Machine Learning, volume 70, p. 214. 2017 link |
1701.07875 |
| 30 | R. Flamary et al. | POT: Python optimal transport | J. Mach. Learn. Res. 22 (2021) 1 | |
| 31 | K. Fatras et al. | Learning with minibatch Wasserstein: asymptotic and gradient properties | in Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, volume 108, 2019 link |
1910.04091 |
| 32 | E. Nalisnick et al. | Do deep generative models know what they don't know? | in International Conference on Learning Representations. 2019 | 1810.09136 |
| 33 | Z. Xiao, Q. Yan, and Y. Amit | Likelihood regret: An out-of-distribution detection score for variational auto-encoder | in Advances in Neural Information Processing Systems, volume 33, p. 20685. 2020 | 2003.02977 |
| 34 | S. Pidhorskyi, R. Almohsen, and G. Doretto | Generative probabilistic novelty detection with adversarial autoencoders | in Proceedings of the 32nd International Conference on Neural Information Processing Systems, p. 6823. 2018 link |
1807.02588 |
| 35 | S. Akcay, A. Atapour-Abarghouei, and T. P. Breckon | GANomaly: Semi-supervised anomaly detection via adversarial training | in Asian Conference on Computer Vision, p. 622, Springer. 2018 link |
1805.06725 |
| 36 | T. Schlegl et al. | f-AnoGAN: Fast unsupervised anomaly detection with generative adversarial networks | Medical Image Analysis 54 (2019) 30 | |
| 37 | Y. Du and I. Mordatch | Implicit generation and modeling with energy-based models | in Proceedings of the 33rd International Conference on Neural Information Processing Systems, p. 324. 2019 link |
1903.08689 |
| 38 | A. Gandrakota | Realtime anomaly detection at the L1 trigger of CMS experiment | PoS ICHEP, 2025 link |
2411.19506 |
| 39 | CMS Collaboration | Model-agnostic search for dijet resonances with anomalous jet substructure in proton-proton collisions at $ \sqrt{s} = $ 13 TeV | Rept. Prog. Phys. 88 (2025) 067802 | CMS-EXO-22-026 2412.03747 |
| 40 | ATLAS Collaboration | Search for new phenomena in two-body invariant mass distributions using unsupervised machine learning for anomaly detection at $ \sqrt{s}= $ 13 TeV with the ATLAS detector | PRL 132 (2024) 081801 | 2307.01612 |
| 41 | B. M. Dillon et al. | A normalized autoencoder for LHC triggers | SciPost Phys. Core 6 (2023) 074 | 2206.14225 |
| 42 | V. C. Rubin, N. Thonnard, and W. K. Ford, Jr. | Rotational properties of 21 SC galaxies with a large range of luminosities and radii, from NGC 4605 (R = 4 kpc) to UGC 2885 (R = 122 kpc) | Astrophys. J. 238 (1980) 471 | |
| 43 | M. Persic, P. Salucci, and F. Stel | The universal rotation curve of spiral galaxies: I. The dark matter connection | Mon. Not. Roy. Astron. Soc. 281 (1996) 27 | astro-ph/9506004 |
| 44 | D. Clowe et al. | A direct empirical proof of the existence of dark matter | Astrophys. J. 648 (2006) L109 | astro-ph/0608407 |
| 45 | DES Collaboration | Dark Energy Survey year 1 results: curved-sky weak lensing mass map | Mon. Not. Roy. Astron. Soc. 475 (2018) 3165 | 1708.01535 |
| 46 | Planck Collaboration | Planck 2018 results. VI. Cosmological parameters | Astron. Astrophys. 641 (2020) A6 | 1807.06209 |
| 47 | M. J. Strassler and K. M. Zurek | Echoes of a hidden valley at hadron colliders | PLB 651 (2007) 374 | hep-ph/0604261 |
| 48 | CMS Collaboration | Search for resonant production of strongly coupled dark matter in proton-proton collisions at 13 TeV | JHEP 06 (2022) 156 | CMS-EXO-19-020 2112.11125 |
| 49 | ATLAS Collaboration | Search for new physics in final states with semivisible jets or anomalous signatures using the ATLAS detector | PRD 112 (2025) 012021 | 2505.01634 |
| 50 | T. Cohen, M. Lisanti, H. K. Lou, and S. Mishra-Sharma | LHC searches for dark sector showers | JHEP 11 (2017) 196 | 1707.05326 |
| 51 | E. Bernreuther, F. Kahlhoefer, M. Krämer, and P. Tunney | Strongly interacting dark sectors in the early universe and at the LHC through a simplified portal | JHEP 01 (2020) 162 | 1907.04346 |
| 52 | J. Alwall et al. | The automated computation of tree-level and next-to-leading order differential cross sections, and their matching to parton shower simulations | JHEP 07 (2014) 079 | 1405.0301 |
| 53 | T. Sjostrand et al. | An introduction to PYTHIA 8.2 | Comput. Phys. Commun. 191 (2015) 159 | 1410.3012 |
| 54 | GEANT4 Collaboration | GEANT4---a simulation toolkit | NIM A 506 (2003) 250 | |
| 55 | NNPDF Collaboration | Parton distributions from high-precision collider data | EPJC 77 (2017) 663 | 1706.00428 |
| 56 | M. Cacciari, G. P. Salam, and G. Soyez | The anti-$ k_{\mathrm{T}} $ jet clustering algorithm | JHEP 04 (2008) 063 | 0802.1189 |
| 57 | M. Cacciari, G. P. Salam, and G. Soyez | FastJet user manual | EPJC 72 (2012) 1896 | 1111.6097 |
| 58 | CMS Collaboration | Performance of quark/gluon discrimination in 8 TeV pp data | CMS Physics Analysis Summary, 2013 CMS-PAS-JME-13-002 |
CMS-PAS-JME-13-002 |
| 59 | P. T. Komiske, E. M. Metodiev, and J. Thaler | Energy flow polynomials: A complete linear basis for jet substructure | JHEP 04 (2018) 013 | 1712.07124 |
| 60 | A. J. Larkoski, G. P. Salam, and J. Thaler | Energy correlation functions for jet substructure | JHEP 06 (2013) 108 | 1305.0007 |
| 61 | A. J. Larkoski, S. Marzani, G. Soyez, and J. Thaler | Soft drop | JHEP 05 (2014) 146 | 1402.2657 |
| 62 | J. Thaler and K. Van Tilburg | Identifying boosted objects with N-subjettiness | JHEP 03 (2011) 015 | 1011.2268 |
| 63 | F. Pedregosa et al. | Scikit-learn: Machine learning in Python | J. Mach. Learn. Res. 12 (2011) 2825 | 1201.0490 |
| 64 | F. Canelli et al. | Autoencoders for semivisible jet detection | JHEP 02 (2022) 074 | 2112.02864 |
| 65 | CMS Collaboration | Source code repository | gitlab | |
| 66 | A. Paszke et al. | PyTorch: An imperative style, high-performance deep learning library | in Proceedings of the 33rd International Conference on Neural Information Processing Systems, volume 32, p. 721. 2019 link |
1912.01703 |
| 67 | T. Tieleman | Training restricted Boltzmann machines using approximations to the likelihood gradient | in Proceedings of the 25th International Conference on Machine Learning, p. 1064. 2008 link |
|
|
Compact Muon Solenoid LHC, CERN |
|
|
|
|
|
|