CMS-MLG-23-001

CMS-MLG-23-001 ; CERN-EP-2023-303
Portable acceleration of CMS computing workflows with coprocessors as a service
CMS Collaboration
23 February 2024
Comput. Softw. Big Sci. 8 (2024) 17
Abstract: Computing demands for large scientific experiments, such as the CMS experiment at the CERN LHC, will increase dramatically in the next decades. To complement the future performance increases of software running on central processing units (CPUs), explorations of coprocessor usage in data processing hold great potential and interest. Coprocessors are a class of computer processors that supplement CPUs, often improving the execution of certain functions due to architectural design choices. We explore the approach of Services for Optimized Network Inference on Coprocessors (SONIC) and study the deployment of this as-a-service approach in large-scale data processing. In the studies, we take a data processing workflow of the CMS experiment and run the main workflow on CPUs, while offloading several machine learning (ML) inference tasks onto either remote or local coprocessors, specifically graphics processing units (GPUs). With experiments performed at Google Cloud, the Purdue Tier-2 computing center, and combinations of the two, we demonstrate the acceleration of these ML algorithms individually on coprocessors and the corresponding throughput improvement for the entire workflow. This approach can be easily generalized to different types of coprocessors and deployed on local CPUs without decreasing the throughput performance. We emphasize that the SONIC approach enables high coprocessor usage and enables the portability to run workflows on different types of coprocessors.
Links: e-print arXiv:2402.15366 [hep-ex] (PDF) ; CDS record ; inSPIRE record ; CADI line (restricted) ;

Figures & Tables	Summary	References	CMS Publications

Figures
png pdf	Figure 1: An example inference as a service setup with multiple coprocessor servers. Clients usually run on CPUs, shown on the left side; servers hosting different models run on coprocessors, shown on the right side.
png pdf	Figure 2: Illustration of the SONIC implementation of IaaS in CMSSW. The figure also shows the possibility of an additional load-balancing layer in the SONIC scheme. For example, if multiple coprocessor-enabled machines are used to host servers, a Kubernetes engine can be set up to distribute inference calls across the machines [84]. Image adapted from Ref. [63].
png pdf	Figure 3: Illustration of the jet information in the Run 2 simulated $\mathrm{t} \overline{\mathrm{t}}$ data set used in subsequent studies. Distributions of the number of jets per event (left) and the number of particles per jet (right) are shown for AK4 jets (upper) and AK8 jets (lower). For the distributions of the number of particles, the rightmost bin is an overflow bin.
png pdf	Figure 3-a: Illustration of the jet information in the Run 2 simulated $\mathrm{t} \overline{\mathrm{t}}$ data set used in subsequent studies. Distributions of the number of jets per event (left) and the number of particles per jet (right) are shown for AK4 jets (upper) and AK8 jets (lower). For the distributions of the number of particles, the rightmost bin is an overflow bin.
png pdf	Figure 3-b: Illustration of the jet information in the Run 2 simulated $\mathrm{t} \overline{\mathrm{t}}$ data set used in subsequent studies. Distributions of the number of jets per event (left) and the number of particles per jet (right) are shown for AK4 jets (upper) and AK8 jets (lower). For the distributions of the number of particles, the rightmost bin is an overflow bin.
png pdf	Figure 3-c: Illustration of the jet information in the Run 2 simulated $\mathrm{t} \overline{\mathrm{t}}$ data set used in subsequent studies. Distributions of the number of jets per event (left) and the number of particles per jet (right) are shown for AK4 jets (upper) and AK8 jets (lower). For the distributions of the number of particles, the rightmost bin is an overflow bin.
png pdf	Figure 3-d: Illustration of the jet information in the Run 2 simulated $\mathrm{t} \overline{\mathrm{t}}$ data set used in subsequent studies. Distributions of the number of jets per event (left) and the number of particles per jet (right) are shown for AK4 jets (upper) and AK8 jets (lower). For the distributions of the number of particles, the rightmost bin is an overflow bin.
png pdf	Figure 4: Average processing time (left) and throughput (right) of the PN-AK4 algorithm served by a TRITON server running on one NVIDIA Tesla T4 GPU, presented as a function of the batch size. Values are shown for different inference backends: ONNX (orange), ONNX with TRT (green), and PYTORCH (red). Performance values for these backends when running on a CPU-based TRITON server are given in dashed lines, with the same color-to-backend correspondence.
png pdf	Figure 5: Average processing time (left) and throughput (right) of one of the AK8 ParticleNet algorithms served by a TRITON server running on one NVIDIA Tesla T4 GPU, presented as a function of the batch size. Values are shown for different inference backends: ONNX (orange), ONNX with TRT (green), and PYTORCH (red). Performance values for these backends when running on a CPU-based TRITON server are given in dashed lines, with the same color-to-backend correspondence.
png pdf	Figure 6: Average processing time (left) and throughput (right) of the DeepMET algorithm served by a TRITON server running on one NVIDIA Tesla T4 GPU, presented as a function of the batch size. Similar performance when running on a CPU-based TRITON server is given in dashed lines.
png pdf	Figure 7: Average processing time (left) and throughput (right) of the DeepTau algorithm served by a TRITON server running on one NVIDIA Tesla T4 GPU, presented as a function of the batch size. Values are shown for different inference backends: TENSORFLOW (TF) (orange), and TENSORFLOW with TRT (blue). Performance values for these backends when running on a CPU-based TRITON server are given in dashed lines, with the same color-to-backend correspondence.
png pdf	Figure 8: The GPU saturation scan performed in GCP, where the per-event throughput is shown as a function of the number of parallel CPU clients for the PYTORCH version of PN-AK4 (black), DeepMET (blue), DeepTau optimized with TRT (red), and all PYTORCH versions of PN-AK8 on a single GPU (green). Each of the parallel jobs was run in a four-threaded configuration. The CPU tasks ran in four-threaded GCP VMs, and the TRITON servers were hosted on separate single GPU VMs also in GCP. The line for direct-inference jobs represents the baseline configuration measured by running all algorithms without the use of the SONIC approach or any GPUs. Each solid line represents running one of the specified models on GPU via the SONIC approach.
png pdf	Figure 9: Production tests across different sites. The CPU tasks always run at Purdue, while the servers with GPU inference tasks for all the models run at Purdue (blue) and at GCP in Iowa (red). The throughput values are higher than those shown in Fig. 8 because the CPUs at Purdue are more powerful than those comprising the GCP VMs.
png pdf	Figure 10: Scale-out test results on Google Cloud. The average throughput of the workflow with the SONIC approach is 4.0 events/s (solid blue), while the average throughput of the direct-inference workflow is 3.5 events/s (dashed red).
png pdf	Figure 11: Throughput (upper) and throughput ratio (lower) between the SONIC approach and direct inference in the local CPU tests at the Purdue Tier-2 cluster. To ensure the CPUs are always saturated, the number of threads per job multiplied by the number of jobs is set to 20.

Tables
png pdf	Table 1: Average event size of different CMS data tiers with Run 2 data-taking conditions [81,69,78].
png pdf	Table 2: The average time of the Mini-AOD processing (without the SONIC approach) with one thread on one single CPU core. The average processing times of the algorithms supported by the SONIC approach are listed in the column labeled ``Time.'' The column labeled ``Fraction'' refers to the fraction of the full workflow's processing time that the algorithm in question consumes. Together, the algorithms currently supported by the SONIC approach consume about 9% of the total processing time. This table also contains the expected server input for each model type created per event in Run 2 $\mathrm{t} \overline{\mathrm{t}}$ events in the column labeled ``Input.''
png pdf	Table 3: Memory usage with direct inference and the SONIC approach in the local CPU tests at the Purdue Tier-2 cluster. The last column is calculated as the sum of client and server memory usage divided by the direct inference memory usage. To ensure the CPUs are always saturated, the number of threads $n_\text{T}$ per job multiplied by the number of jobs is set to 20.

Summary

Within the next decade, the data-taking rate at the LHC will increase dramatically, straining the expected computing resources of the LHC experiments. At the same time, more algorithms that run on these resources will be converted into either machine learning or domain algorithms that are easily accelerated with the use of coprocessors, such as graphics processing units (GPUs). By pursuing heterogeneous architectures, it is possible to alleviate potential shortcomings of available central processing unit (CPU) resources. Inference as a service (IaaS) is a promising scheme to integrate coprocessors into CMS computing workflows. In IaaS, client code simply assembles the input data for an algorithm, sends that input to an inference server running either locally or remotely, and retrieves output from the server. The implementation of IaaS discussed throughout this paper is called the Services for Optimized Network Inference on Coprocessors (SONIC) approach, which employs NVIDIA TRITON Inference Servers to host models on coprocessors, as demonstrated here in studies on GPUs, CPUs, and Graphcore Intelligence Processing Units (IPUs). In this paper, the SONIC approach in the CMS software framework ( CMSSW ) is demonstrated in a sample Mini-AOD workflow, where algorithms for jet tagging, tau lepton identification, and missing transverse momentum regression are ported to run on inference servers. These algorithms account for nearly 10% of the total processing time per event in a simulated data set of top quark-antiquark events. After model profiling, which is used to optimize server performance and determine the needed number of GPUs for a given number of client jobs, the expected 10% decrease in per-event processing time was achieved in a large-scale test of Mini-AOD production with the SONIC approach that used about 10\,000 CPU cores and 100 GPUs. The network bandwidth is large enough to support high input-output model inference for the workflow tested, and it will be monitored as the fraction of algorithms using remote GPUs increases. In addition to meeting performance expectations, we demonstrated that the throughput results are not highly sensitive to the physical client-to-server distance, at least up to distances of hundreds of kilometers. Running inference through TRITON servers on local CPU resources does not affect the throughput compared with the standard approach of running inference directly on CPUs in the job thread. We also performed a test using GraphCore IPUs to demonstrate the flexibility of the SONIC approach. The SONIC approach for IaaS represents a flexible method to accelerate algorithms, which is increasingly valuable for LHC experiments. Using a realistic workflow, we highlighted many of the benefits of the SONIC approach, including the use of remote resources, workflow acceleration, and portability to different processor technologies. To make it a viable and robust paradigm for CMS computing in the future, additional studies are ongoing or planned for monitoring and mitigating potential issues such as excessive network and memory usage or server failures.

References
1	L. Evans and P. Bryant	LHC machine	JINST 3 (2008) S08001
2	ATLAS Collaboration	The ATLAS experiment at the CERN Large Hadron Collider	JINST 3 (2008) S08003
3	CMS Collaboration	The CMS experiment at the CERN LHC	JINST 3 (2008) S08004
4	ATLAS Collaboration	Observation of a new particle in the search for the Standard Model Higgs boson with the ATLAS detector at the LHC	PLB 716 (2012) 1	1207.7214
5	CMS Collaboration	Observation of a new boson at a mass of 125 GeV with the CMS experiment at the LHC	PLB 716 (2012) 30	CMS-HIG-12-028 1207.7235
6	CMS Collaboration	Observation of a new boson with mass near 125 GeV in pp collisions at $\sqrt{s} =$ 7 and 8 TeV	JHEP 06 (2013) 081	CMS-HIG-12-036 1303.4571
7	CMS Collaboration	Search for supersymmetry in proton-proton collisions at 13 TeV in final states with jets and missing transverse momentum	JHEP 10 (2019) 244	CMS-SUS-19-006 1908.04722
8	CMS Collaboration	Combined searches for the production of supersymmetric top quark partners in proton-proton collisions at $\sqrt{s} =$ 13 TeV	EPJC 81 (2021) 970	CMS-SUS-20-002 2107.10892
9	CMS Collaboration	Search for higgsinos decaying to two Higgs bosons and missing transverse momentum in proton-proton collisions at $\sqrt{s} =$ 13 TeV	JHEP 05 (2022) 014	CMS-SUS-20-004 2201.04206
10	CMS Collaboration	Search for supersymmetry in final states with two oppositely charged same-flavor leptons and missing transverse momentum in proton-proton collisions at $\sqrt{s} =$ 13 TeV	JHEP 04 (2021) 123	CMS-SUS-20-001 2012.08600
11	ATLAS Collaboration	Search for squarks and gluinos in final states with hadronically decaying $\tau$ -leptons, jets, and missing transverse momentum using pp collisions at $\sqrt{s} =$ 13 TeV with the ATLAS detector	PRD 99 (2019) 012009	1808.06358
12	ATLAS Collaboration	Search for top squarks in events with a Higgs or Z boson using 139 fb $^{-1}$ of pp collision data at $\sqrt{s}=$ 13 TeV with the ATLAS detector	EPJC 80 (2020) 1080	2006.05880
13	ATLAS Collaboration	Search for charginos and neutralinos in final states with two boosted hadronically decaying bosons and missing transverse momentum in pp collisions at $\sqrt{s} =$ 13 TeV with the ATLAS detector	PRD 104 (2021) 112010	2108.07586
14	ATLAS Collaboration	Search for direct pair production of sleptons and charginos decaying to two leptons and neutralinos with mass splittings near the W-boson mass in $\sqrt{s} =$ 13 TeV pp collisions with the ATLAS detector	JHEP 06 (2023) 031	2209.13935
15	ATLAS Collaboration	Search for new phenomena in events with an energetic jet and missing transverse momentum in pp collisions at $\sqrt{s}=$ 13 TeV with the ATLAS detector	PRD 103 (2021) 112006	2102.10874
16	CMS Collaboration	Search for new particles in events with energetic jets and large missing transverse momentum in proton-proton collisions at $\sqrt{s}=$ 13 TeV	JHEP 11 (2021) 153	CMS-EXO-20-004 2107.13021
17	ATLAS Collaboration	Search for new resonances in mass distributions of jet pairs using 139 fb $^{-1}$ of pp collisions at $\sqrt{s}=$ 13 TeV with the ATLAS detector	JHEP 03 (2020) 145	1910.08447
18	ATLAS Collaboration	Search for high-mass dilepton resonances using 139 fb $^{-1}$ of pp collision data collected at $\sqrt{s}=$ 13 TeV with the ATLAS detector	PLB 796 (2019) 68	1903.06248
19	CMS Collaboration	Search for narrow and broad dijet resonances in proton-proton collisions at $\sqrt{s}=$ 13 TeV and constraints on dark matter mediators and other new particles	JHEP 08 (2018) 130	CMS-EXO-16-056 1806.00843
20	CMS Collaboration	Search for high mass dijet resonances with a new background prediction method in proton-proton collisions at $\sqrt{s}=$ 13 TeV	JHEP 05 (2020) 033	CMS-EXO-19-012 1911.03947
21	CMS Collaboration	Search for resonant and nonresonant new phenomena in high-mass dilepton final states at $\sqrt{s}=$ 13 TeV	JHEP 07 (2021) 208	CMS-EXO-19-019 2103.02708
22	O. Aberle et al.	High-Luminosity Large Hadron Collider (HL-LHC): Technical design report	CERN Yellow Rep. Monogr. 10 (2020)
23	O. Brüning and L. Rossi, eds	The High Luminosity Large Hadron Collider: the new machine for illuminating the mysteries of universe	World Scientific, ISBN~978-981-4675-46-8, 978-981-4678-14-8 link
24	CMS Collaboration	The Phase-2 upgrade of the CMS level-1 trigger	CMS Technical Design Report CERN-LHCC-2020-004, CMS-TDR-021, 2020 CDS
25	ATLAS Collaboration	Technical design report for the Phase-II upgrade of the ATLAS TDAQ system	ATLAS Technical Design Report CERN-LHCC-2017-020, ATLAS-TDR-029, 2017 link
26	A. Ryd and L. Skinnari	Tracking triggers for the HL-LHC	ARNPS 70 (2020) 171	2010.13557
27	ATLAS Collaboration	Operation of the ATLAS trigger system in Run 2	JINST 15 (2020) P10004	2007.12539
28	CMS Collaboration	The CMS trigger system	JINST 12 (2017) P01020	CMS-TRG-12-001 1609.02366
29	CMS Collaboration	The Phase-2 upgrade of the CMS data acquisition and high level trigger	CMS Technical Design Report CERN-LHCC-2021-007, CMS-TDR-022, 2021 CDS
30	CMS Offline Software and Computing Group	CMS Phase-2 computing model: Update document	CMS Note CMS-NOTE-2022-008, 2022
31	ATLAS Collaboration	ATLAS software and computing HL-LHC roadmap	LHCC Public Document CERN-LHCC-2022-005, LHCC-G-182, 2022
32	R. H. Dennard et al.	Design of ion-implanted MOSFET's with very small physical dimensions	IEEE J. Solid-, 1974 State Circuits 9 (1974) 256
33	Graphcore	Intelligence processing unit	link
34	Z. Jia, B. Tillman, M. Maggioni, and D. P. Scarpazza	Dissecting the Graphcore IPU architecture via microbenchmarking		1912.03413
35	D. Guest, K. Cranmer, and D. Whiteson	Deep learning and its application to LHC physics	ARNPS 68 (2018) 161	1806.11484
36	K. Albertsson et al.	Machine learning in high energy physics community white paper	JPCS 1085 (2018) 022008	1807.02876
37	D. Bourilkov	Machine and deep learning applications in particle physics	Int. J. Mod. Phys. A 34 (2020) 1930019	1912.08245
38	A. J. Larkoski, I. Moult, and B. Nachman	Jet substructure at the Large Hadron Collider: A review of recent advances in theory and machine learning	Phys. Rept. 841 (2020) 1	1709.04464
39	Feickert, Matthew and Nachman, Benjamin	A living review of machine learning for particle physics		2102.02770
40	P. Harris et al.	Physics community needs, tools, and resources for machine learning	in Proc. 2021 US Community Study on the Future of Particle Physics, 2022 link	2203.16255
41	C. Savard et al.	Optimizing high throughput inference on graph neural networks at shared computing facilities with the NVIDIA Triton Inference Server	Accepted by Comput. Softw. Big Sci, 2023	2312.06838
42	S. Farrell et al.	Novel deep learning methods for track reconstruction	in 4th International Workshop Connecting The Dots (2018)	1810.06111
43	S. Amrouche et al.	The Tracking Machine Learning challenge: Accuracy phase	Springer Cham, 4, 2019 link	1904.06778
44	X. Ju et al.	Performance of a geometric deep learning pipeline for HL-LHC particle tracking	EPJC 81 (2021) 876	2103.06995
45	G. DeZoort et al.	Charged particle tracking via edge-classifying interaction networks	CSBS 5 (2021) 26	2103.16701
46	S. R. Qasim, J. Kieseler, Y. Iiyama, and M. Pierini	Learning representations of irregular particle-detector geometry with distance-weighted graph networks	EPJC 79 (2019) 608	1902.07987
47	J. Kieseler	Object condensation: one-stage grid-free multi-object reconstruction in physics detectors, graph and image data	EPJC 80 (2020) 886	2002.03605
48	CMS Collaboration	GNN-based end-to-end reconstruction in the CMS Phase 2 high-granularity calorimeter	JPCS 2438 (2023) 012090	2203.01189
49	J. Pata et al.	MLPF: Efficient machine-learned particle-flow reconstruction using graph neural networks	EPJC 81 (2021) 381	2101.08578
50	CMS Collaboration	Machine learning for particle flow reconstruction at CMS	JPCS 2438 (2023) 012100	2203.00330
51	F. Mokhtar et al.	Progress towards an improved particle flow algorithm at CMS with machine learning	in Proc. 21st Intern. Workshop on Advanced Computing and Analysis Techniques in Physics Research: AI meets Reality, 2023	2303.17657
52	F. A. Di Bello et al.	Reconstructing particles in jets using set transformer and hypergraph prediction networks	EPJC 83 (2023) 596	2212.01328
53	J. Pata et al.	Improved particle-flow event reconstruction with scalable neural networks for current and future particle detectors		2309.06782
54	E. A. Moreno et al.	JEDI-net: a jet identification algorithm based on interaction networks	EPJC 80 (2020) 58	1908.05318
55	H. Qu and L. Gouskos	ParticleNet: Jet tagging via particle clouds	PRD 101 (2020) 056019	1902.08570
56	E. A. Moreno et al.	Interaction networks for the identification of boosted $\mathrm{H}\to\mathrm{b}\overline{\mathrm{b}}$ decays	PRD 102 (2020) 012010	1909.12285
57	E. Bols et al.	Jet flavour classification using DeepJet	JINST 15 (2020) P12012	2008.10519
58	CMS Collaboration	Identification of heavy, energetic, hadronically decaying particles using machine-learning techniques	JINST 15 (2020) P06005	CMS-JME-18-002 2004.08262
59	H. Qu, C. Li, and S. Qian	Particle transformer for jet tagging	in Proc. 39th Intern. Conf. on Machine Learning, K. Chaudhuri et al., eds., volume 162, 2022	2202.03772
60	CMS Collaboration	Muon identification using multivariate techniques in the CMS experiment in proton-proton collisions at $\sqrt{s} =$ 13 TeV	Submitted to JINST, 2023	CMS-MUO-22-001 2310.03844
61	J. Duarte et al.	FPGA-accelerated machine learning inference as a service for particle physics computing	CSBS 3 (2019) 13	1904.08986
62	D. Rankin et al.	FPGAs-as-a-service toolkit (FaaST)	in Proc. 2020 IEEE/ACM Intern. Workshop on Heterogeneous High-performance Reconfigurable Computing (H2RC) link
63	J. Krupa et al.	GPU coprocessors as a service for deep learning inference in high energy physics	MLST 2 (2021) 035005	2007.10359
64	M. Wang et al.	GPU-accelerated machine learning inference as a service for computing in neutrino experiments	Front. Big Data 3 (2021) 604083	2009.04509
65	T. Cai et al.	Accelerating machine learning inference with GPUs in ProtoDUNE data processing	Comput. Softw. Big Sci. 7 (2023) 11	2301.04633
66	A. Gunny et al.	Hardware-accelerated inference for real-time gravitational-wave astronomy	Nature Astron. 6 (2022) 529	2108.12430
67	ALICE Collaboration	Real-time data processing in the ALICE high level trigger at the LHC	CPC 242 (2019) 25	1812.08036
68	R. Aaij et al.	Allen: A high level trigger on GPUs for LHCb	CSBS 4 (2020) 7	1912.09161
69	LHCb Collaboration	The LHCb upgrade I		2305.10515
70	A. Bocci et al.	Heterogeneous reconstruction of tracks and primary vertices with the CMS pixel tracker	Front. Big Data 3 (2020) 601728	2008.13461
71	CMS Collaboration	CMS high level trigger performance comparison on CPUs and GPUs	J. Phys. Conf. Ser. 2438 (2023) 012016
72	D. Vom Bruch	Real-time data processing with GPUs in high energy physics	JINST 15 (2020) C06010	2003.11491
73	CMS Collaboration	Mini-AOD: A new analysis data format for CMS	JPCS 664 (2015) 7	1702.04685
74	CMS Collaboration	CMS physics: Technical design report volume 1: Detector performance and software	CMS Technical Design Report CERN-LHCC-2006-001, CMS-TDR-8-1, 2006 CDS
75	CMS Collaboration	CMSSW on Github	link
76	CMS Collaboration	Performance of the CMS Level-1 trigger in proton-proton collisions at $\sqrt{s} =$ 13 TeV	JINST 15 (2020) P10017	CMS-TRG-17-001 2006.10165
77	CMS Collaboration	Electron and photon reconstruction and identification with the CMS experiment at the CERN LHC	JINST 16 (2021) P05014	CMS-EGM-17-001 2012.06888
78	CMS Collaboration	Description and performance of track and primary-vertex reconstruction with the CMS tracker	JINST 9 (2014) P10009	CMS-TRK-11-001 1405.6569
79	CMS Collaboration	Particle-flow reconstruction and global event description with the CMS detector	JINST 12 (2017) P10003	CMS-PRF-14-001 1706.04965
80	oneAPI	Threading Building Blocks	link
81	A. Bocci et al.	Bringing heterogeneity to the CMS software framework	EPJWC 245 (2020) 05009	2004.04334
82	CMS Collaboration	A further reduction in CMS event data for analysis: the NANOAOD format	EPJWC 214 (2019) 06021
83	CMS Collaboration	Extraction and validation of a new set of CMS PYTHIA8 tunes from underlying-event measurements	EPJC 80 (2020) 4	CMS-GEN-17-001 1903.12179
84	CMS Collaboration	Pileup mitigation at CMS in 13 TeV data	JINST 15 (2020) P09018	CMS-JME-18-001 2003.00503
85	CMS Collaboration	NANOAOD: A new compact event data format in CMS	EPJWC 245 (2020) 06002
86	NVIDIA	Triton Inference Server	link
87	gRPC	A high performance, open source universal RPC framework	link
88	Kubernetes	Kubernetes documentation	link
89	K. Pedro et al.	SonicCore	link
90	K. Pedro et al.	SonicTriton	link
91	K. Pedro	SonicCMS	link
92	A. M. Caulfield et al.	A cloud-scale acceleration architecture	in Proc. 49th Annual IEEE/ACM Intern. Symp. on Microarchitecture (MICRO), 2016 link
93	V. Kuznetsov	vkuznet/TFaaS: First public version	link
94	V. Kuznetsov, L. Giommi, and D. Bonacorsi	MLaaS4HEP: Machine learning as a service for HEP	CSBS 5 (2021) 17	2007.14781
95	KServe	Documentation website	link
96	NVIDIA	Triton Inference Server README (release 22.08)	link
97	A. Paszke et al.	PyTorch: An imperative style, high-performance deep learning library	Advances in Neural Information Processing Systems 3 (2019) 8024 link	1912.01703
98	NVIDIA	TensorRT	link
99	ONNX	Open Neural Network Exchange (ONNX)	link
100	M. Abadi et al.	TensorFlow: A system for large-scale machine learning		1605.08695
101	T. Chen and C. Guestrin	XGBoost: A scalable tree boosting system	in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, 2016 link	1603.02754
102	NVIDIA	Triton Inference Server Model Analyzer	link
103	P. Buncic et al.	CernVM -- a virtual software appliance for LHC applications	JPCS 219 (2010) 042003
104	S. D. Guida et al.	The CMS condition database system	JPCS 664 (2015) 042024
105	L. Bauerdick et al.	Using Xrootd to federate regional storage	JPCS 396 (2012) 042009
106	CMS Collaboration	Performance of missing transverse momentum reconstruction in proton-proton collisions at $\sqrt{s}=$ 13 TeV using the CMS detector	JINST 14 (2019) P07004	CMS-JME-17-001 1903.06078
107	CMS Collaboration	ParticleNet producer in CMSSW	link
108	CMS Collaboration	ParticleNet SONIC producer in CMSSW	link
109	M. Cacciari, G. P. Salam, and G. Soyez	The anti- $k_{\mathrm{T}}$ jet clustering algorithm	JHEP 04 (2008) 063	0802.1189
110	M. Cacciari, G. P. Salam, and G. Soyez	FastJet user manual	EPJC 72 (2012) 1896	1111.6097
111	CMS Collaboration	Performance of the ParticleNet tagger on small and large-radius jets at high level trigger in Run 3	CMS Detector Performance Note CMS-DP-2023-021, 2023 CDS
112	CMS Collaboration	Identification of highly Lorentz-boosted heavy particles using graph neural networks and new mass decorrelation techniques	CMS Detector Performance Note CMS-DP-2020-002, 2020 CDS
113	CMS Collaboration	Mass regression of highly-boosted jets using graph neural networks	CMS Detector Performance Note CMS-DP-2021-017, 2021 CDS
114	Y. Feng	A new deep-neural-network-based missing transverse momentum estimator, and its application to W recoil	PhD thesis, University of Maryland, College Park, 2020 link
115	CMS Collaboration	Identification of hadronic tau lepton decays using a deep neural network	JINST 17 (2022) P07023	CMS-TAU-20-001 2201.08458
116	NVIDIA Corporation	NVIDIA T4 70W low profile PCIe GPU accelerator
117	B. Holzman et al.	HEPCloud, a New Paradigm for HEP Facilities: CMS Amazon Web Services Investigation	CSBS 1 (2017) 1	1710.00100
118	I. Corporation	Intel 64 and IA-32 architectures software developer's manual	Intel Corporation, Santa Clara, 2023
119	SchedMD	Slurm workload manager	link
120	Advanced Micro Devices, Inc.	AMD EPYC 7002 series processors power electronic health record solutions	Advanced Micro Devices, Inc., Santa Clara, 2020