CMS-PAS-BTV-20-001

CMS-PAS-BTV-20-001
Calibration of charm jet identification algorithms using proton-proton collision events at $\sqrt{s}=$ 13 TeV
CMS Collaboration
March 2021

Abstract: Many measurements at the LHC experiments require an efficient identification of heavy-flavour jets, i.e. jets originating from bottom (b) or charm (c) quarks. An overview of the algorithms used for c jet identification in the CMS experiment is given and a novel method to calibrate them is presented. The new method corrects the entire distribution expected as output when the algorithms are applied to jets of different flavours. It is based on an iterative method exploiting three distinct control regions that are enriched with either b jets, c jets or light-flavour jets. Finally, a validation of the method is performed by checking closure of the measured correction factors on the same collision data as well as by testing the method on toy datasets which emulate different miscalibration conditions. The calibrated results improve over traditional efficiency measurements and are expected to increase the sensitivity of future physics analysis by facilitating the use of the full distributions of heavy-flavour identification algorithm outputs, for example, as inputs to machine learning algorithms.
Links: CDS record (PDF) ; inSPIRE record ; CADI line (restricted) ; These preliminary results are superseded in this paper, Submitted to JINST. The superseded preliminary plots can be found here.

Figures & Tables	Summary	References	CMS Publications

Figures
png pdf	Figure 1: Normalised distributions of the CvsL (left) and CvsB (right) discriminators for the DeepCSV (dashed) and DeepJet (full) algorithms using jets from simulated hadronic $\mathrm{t\bar{t}}$ events with ${p_{\mathrm {T}}} >$ 20 GeV and ${\| \eta \|} <$ 2.5. The distribution is shown for b jets (red), c jets (green) and light-flavour jets (blue) separately.
png pdf	Figure 1-a: Normalised distributions of the CvsL (left) and CvsB (right) discriminators for the DeepCSV (dashed) and DeepJet (full) algorithms using jets from simulated hadronic $\mathrm{t\bar{t}}$ events with ${p_{\mathrm {T}}} >$ 20 GeV and ${\| \eta \|} <$ 2.5. The distribution is shown for b jets (red), c jets (green) and light-flavour jets (blue) separately.
png pdf	Figure 1-b: Normalised distributions of the CvsL (left) and CvsB (right) discriminators for the DeepCSV (dashed) and DeepJet (full) algorithms using jets from simulated hadronic $\mathrm{t\bar{t}}$ events with ${p_{\mathrm {T}}} >$ 20 GeV and ${\| \eta \|} <$ 2.5. The distribution is shown for b jets (red), c jets (green) and light-flavour jets (blue) separately.
png pdf	Figure 2: ROC curves showing the individual performance of the CvsL (left) and CvsB (right) discriminators for the DeepCSV (blue) and DeepJet (red) algorithms using jets from simulated hadronic $\mathrm{t\bar{t}}$ events with ${p_{\mathrm {T}}} >$ 20 GeV and ${\| \eta \|} <$ 2.5.
png pdf	Figure 2-a: ROC curves showing the individual performance of the CvsL (left) and CvsB (right) discriminators for the DeepCSV (blue) and DeepJet (red) algorithms using jets from simulated hadronic $\mathrm{t\bar{t}}$ events with ${p_{\mathrm {T}}} >$ 20 GeV and ${\| \eta \|} <$ 2.5.
png pdf	Figure 2-b: ROC curves showing the individual performance of the CvsL (left) and CvsB (right) discriminators for the DeepCSV (blue) and DeepJet (red) algorithms using jets from simulated hadronic $\mathrm{t\bar{t}}$ events with ${p_{\mathrm {T}}} >$ 20 GeV and ${\| \eta \|} <$ 2.5.
png pdf	Figure 3: Two-dimensional ROC contours showing the c-tagging efficiency as a simultaneous function of b jet and light-flavour jet mistagging rates for DeepCSV (blue) and DeepJet (red) algorithms using jets from simulated hadronic $\mathrm{t\bar{t}}$ events with ${p_{\mathrm {T}}} >$ 20 GeV and ${\| \eta \|} <$ 2.5.
png pdf	Figure 4: Normalised two-dimensional distributions showing the CvsL and CvsB discriminators on the x-axis and y-axis respectively. Distributions are shown using c jets (upper), b jets (middle) and light-flavour jets (lower) from simulated hadronic $\mathrm{t\bar{t}}$ events with ${p_{\mathrm {T}}} >$ 20 GeV and ${\| \eta \|} <$ 2.5. The left-hand column shows the discriminators of the DeepCSV algorithm, whereas the right-hand column shows those of the DeepJet algorithm.
png pdf	Figure 4-a: Normalised two-dimensional distributions showing the CvsL and CvsB discriminators on the x-axis and y-axis respectively. Distributions are shown using c jets (upper), b jets (middle) and light-flavour jets (lower) from simulated hadronic $\mathrm{t\bar{t}}$ events with ${p_{\mathrm {T}}} >$ 20 GeV and ${\| \eta \|} <$ 2.5. The left-hand column shows the discriminators of the DeepCSV algorithm, whereas the right-hand column shows those of the DeepJet algorithm.
png pdf	Figure 4-b: Normalised two-dimensional distributions showing the CvsL and CvsB discriminators on the x-axis and y-axis respectively. Distributions are shown using c jets (upper), b jets (middle) and light-flavour jets (lower) from simulated hadronic $\mathrm{t\bar{t}}$ events with ${p_{\mathrm {T}}} >$ 20 GeV and ${\| \eta \|} <$ 2.5. The left-hand column shows the discriminators of the DeepCSV algorithm, whereas the right-hand column shows those of the DeepJet algorithm.
png pdf	Figure 4-c: Normalised two-dimensional distributions showing the CvsL and CvsB discriminators on the x-axis and y-axis respectively. Distributions are shown using c jets (upper), b jets (middle) and light-flavour jets (lower) from simulated hadronic $\mathrm{t\bar{t}}$ events with ${p_{\mathrm {T}}} >$ 20 GeV and ${\| \eta \|} <$ 2.5. The left-hand column shows the discriminators of the DeepCSV algorithm, whereas the right-hand column shows those of the DeepJet algorithm.
png pdf	Figure 4-d: Normalised two-dimensional distributions showing the CvsL and CvsB discriminators on the x-axis and y-axis respectively. Distributions are shown using c jets (upper), b jets (middle) and light-flavour jets (lower) from simulated hadronic $\mathrm{t\bar{t}}$ events with ${p_{\mathrm {T}}} >$ 20 GeV and ${\| \eta \|} <$ 2.5. The left-hand column shows the discriminators of the DeepCSV algorithm, whereas the right-hand column shows those of the DeepJet algorithm.
png pdf	Figure 4-e: Normalised two-dimensional distributions showing the CvsL and CvsB discriminators on the x-axis and y-axis respectively. Distributions are shown using c jets (upper), b jets (middle) and light-flavour jets (lower) from simulated hadronic $\mathrm{t\bar{t}}$ events with ${p_{\mathrm {T}}} >$ 20 GeV and ${\| \eta \|} <$ 2.5. The left-hand column shows the discriminators of the DeepCSV algorithm, whereas the right-hand column shows those of the DeepJet algorithm.
png pdf	Figure 4-f: Normalised two-dimensional distributions showing the CvsL and CvsB discriminators on the x-axis and y-axis respectively. Distributions are shown using c jets (upper), b jets (middle) and light-flavour jets (lower) from simulated hadronic $\mathrm{t\bar{t}}$ events with ${p_{\mathrm {T}}} >$ 20 GeV and ${\| \eta \|} <$ 2.5. The left-hand column shows the discriminators of the DeepCSV algorithm, whereas the right-hand column shows those of the DeepJet algorithm.
png pdf	Figure 5: Feynman diagrams showing production of charm quarks in association with W boson at the LHC (left and middle) along with the major background (right).
png pdf	Figure 6: Pre-calibration distributions of CvsL (left) and CvsB (right) obtained from the DeepCSV (upper) and DeepJet (lower) taggers for jets selected in the W+c (OS-SS) selection. The bin corresponding to a tagger value of $-1$ is plotted at $-0.1$ for DeepCSV.
png pdf	Figure 6-a: Pre-calibration distributions of CvsL (left) and CvsB (right) obtained from the DeepCSV (upper) and DeepJet (lower) taggers for jets selected in the W+c (OS-SS) selection. The bin corresponding to a tagger value of $-1$ is plotted at $-0.1$ for DeepCSV.
png pdf	Figure 6-b: Pre-calibration distributions of CvsL (left) and CvsB (right) obtained from the DeepCSV (upper) and DeepJet (lower) taggers for jets selected in the W+c (OS-SS) selection. The bin corresponding to a tagger value of $-1$ is plotted at $-0.1$ for DeepCSV.
png pdf	Figure 6-c: Pre-calibration distributions of CvsL (left) and CvsB (right) obtained from the DeepCSV (upper) and DeepJet (lower) taggers for jets selected in the W+c (OS-SS) selection. The bin corresponding to a tagger value of $-1$ is plotted at $-0.1$ for DeepCSV.
png pdf	Figure 6-d: Pre-calibration distributions of CvsL (left) and CvsB (right) obtained from the DeepCSV (upper) and DeepJet (lower) taggers for jets selected in the W+c (OS-SS) selection. The bin corresponding to a tagger value of $-1$ is plotted at $-0.1$ for DeepCSV.
png pdf	Figure 7: Pre-calibration distributions of CvsL (left) and CvsB (right) obtained from the DeepCSV (upper) and DeepJet (lower) taggers for jets selected in the $\mathrm{t\bar{t}}$ selection. The bin corresponding to a tagger value of $-1$ is plotted at $-0.1$ .
png pdf	Figure 7-a: Pre-calibration distributions of CvsL (left) and CvsB (right) obtained from the DeepCSV (upper) and DeepJet (lower) taggers for jets selected in the $\mathrm{t\bar{t}}$ selection. The bin corresponding to a tagger value of $-1$ is plotted at $-0.1$ .
png pdf	Figure 7-b: Pre-calibration distributions of CvsL (left) and CvsB (right) obtained from the DeepCSV (upper) and DeepJet (lower) taggers for jets selected in the $\mathrm{t\bar{t}}$ selection. The bin corresponding to a tagger value of $-1$ is plotted at $-0.1$ .
png pdf	Figure 7-c: Pre-calibration distributions of CvsL (left) and CvsB (right) obtained from the DeepCSV (upper) and DeepJet (lower) taggers for jets selected in the $\mathrm{t\bar{t}}$ selection. The bin corresponding to a tagger value of $-1$ is plotted at $-0.1$ .
png pdf	Figure 7-d: Pre-calibration distributions of CvsL (left) and CvsB (right) obtained from the DeepCSV (upper) and DeepJet (lower) taggers for jets selected in the $\mathrm{t\bar{t}}$ selection. The bin corresponding to a tagger value of $-1$ is plotted at $-0.1$ .
png pdf	Figure 8: Pre-calibration distributions of CvsL (left) and CvsB (right) obtained from the DeepCSV (upper) and DeepJet (lower) tagger for jets selected in the DY+jet selection. The bin corresponding to a tagger value of $-1$ is plotted at $-0.1$ .
png pdf	Figure 8-a: Pre-calibration distributions of CvsL (left) and CvsB (right) obtained from the DeepCSV (upper) and DeepJet (lower) tagger for jets selected in the DY+jet selection. The bin corresponding to a tagger value of $-1$ is plotted at $-0.1$ .
png pdf	Figure 8-b: Pre-calibration distributions of CvsL (left) and CvsB (right) obtained from the DeepCSV (upper) and DeepJet (lower) tagger for jets selected in the DY+jet selection. The bin corresponding to a tagger value of $-1$ is plotted at $-0.1$ .
png pdf	Figure 8-c: Pre-calibration distributions of CvsL (left) and CvsB (right) obtained from the DeepCSV (upper) and DeepJet (lower) tagger for jets selected in the DY+jet selection. The bin corresponding to a tagger value of $-1$ is plotted at $-0.1$ .
png pdf	Figure 8-d: Pre-calibration distributions of CvsL (left) and CvsB (right) obtained from the DeepCSV (upper) and DeepJet (lower) tagger for jets selected in the DY+jet selection. The bin corresponding to a tagger value of $-1$ is plotted at $-0.1$ .
png pdf	Figure 9: The shape calibration scale factor maps for DeepCSV- (left) and DeepJet- (right) based c-taggers for c jets are shown. SF $_c^{(-1)}$ denotes the SF for c jets with defaulted discriminator values, along with statistical (first term) and systematic (second term) uncertainties. The total uncertainty is denoted by red envelopes around the central values, while statistical uncertainties alone are denoted by black lines. Grey datapoints with hatched uncertainties denote bins with statistics insufficient for SF evaluation.
png pdf	Figure 10: The shape calibration scale factor maps for DeepCSV- (left) and DeepJet- (right) based c-taggers for b jets are shown. SF $_b^{(-1)}$ denotes the SF for b jets with defaulted discriminator values, along with statistical (first term) and systematic (second term) uncertainties. The total uncertainty is denoted by red envelopes around the central values, while statistical uncertainties alone are denoted by black lines. Grey datapoints with hatched uncertainties denote bins with statistics insufficient for SF evaluation.
png pdf	Figure 11: The shape calibration scale factor maps for DeepCSV- (left) and DeepJet- (right) based c-taggers for light-flavour jets are shown. SF $_{\mathrm {udsg}}^{(-1)}$ denotes the SF for light-flavour jets with defaulted discriminator values, along with statistical (first term) and systematic (second term) uncertainties. The total uncertainty is denoted by red envelopes around the central values, while statistical uncertainties alone are denoted by black lines. Grey datapoints with hatched uncertainties denote bins with statistics insufficient for SF evaluation.
png pdf	Figure 12: Contribution of each source of SF uncertainty, calculated as the square of the relative uncertainty in jet yield and expressed as the maximum of the up and down variations, at various values of the DeepCSV CvsL (left) and CvsB (right) discriminators for c (upper), b (middle) and light (lower) flavours. The effective total relative uncertainty values per bin are also shown in grey, for reference. The bin corresponding to a tagger value of $-1$ is plotted at $-0.1$ . Statistical uncertainties are not shown.
png pdf	Figure 12-a: Contribution of each source of SF uncertainty, calculated as the square of the relative uncertainty in jet yield and expressed as the maximum of the up and down variations, at various values of the DeepCSV CvsL (left) and CvsB (right) discriminators for c (upper), b (middle) and light (lower) flavours. The effective total relative uncertainty values per bin are also shown in grey, for reference. The bin corresponding to a tagger value of $-1$ is plotted at $-0.1$ . Statistical uncertainties are not shown.
png pdf	Figure 12-b: Contribution of each source of SF uncertainty, calculated as the square of the relative uncertainty in jet yield and expressed as the maximum of the up and down variations, at various values of the DeepCSV CvsL (left) and CvsB (right) discriminators for c (upper), b (middle) and light (lower) flavours. The effective total relative uncertainty values per bin are also shown in grey, for reference. The bin corresponding to a tagger value of $-1$ is plotted at $-0.1$ . Statistical uncertainties are not shown.
png pdf	Figure 12-c: Contribution of each source of SF uncertainty, calculated as the square of the relative uncertainty in jet yield and expressed as the maximum of the up and down variations, at various values of the DeepCSV CvsL (left) and CvsB (right) discriminators for c (upper), b (middle) and light (lower) flavours. The effective total relative uncertainty values per bin are also shown in grey, for reference. The bin corresponding to a tagger value of $-1$ is plotted at $-0.1$ . Statistical uncertainties are not shown.
png pdf	Figure 12-d: Contribution of each source of SF uncertainty, calculated as the square of the relative uncertainty in jet yield and expressed as the maximum of the up and down variations, at various values of the DeepCSV CvsL (left) and CvsB (right) discriminators for c (upper), b (middle) and light (lower) flavours. The effective total relative uncertainty values per bin are also shown in grey, for reference. The bin corresponding to a tagger value of $-1$ is plotted at $-0.1$ . Statistical uncertainties are not shown.
png pdf	Figure 12-e: Contribution of each source of SF uncertainty, calculated as the square of the relative uncertainty in jet yield and expressed as the maximum of the up and down variations, at various values of the DeepCSV CvsL (left) and CvsB (right) discriminators for c (upper), b (middle) and light (lower) flavours. The effective total relative uncertainty values per bin are also shown in grey, for reference. The bin corresponding to a tagger value of $-1$ is plotted at $-0.1$ . Statistical uncertainties are not shown.
png pdf	Figure 12-f: Contribution of each source of SF uncertainty, calculated as the square of the relative uncertainty in jet yield and expressed as the maximum of the up and down variations, at various values of the DeepCSV CvsL (left) and CvsB (right) discriminators for c (upper), b (middle) and light (lower) flavours. The effective total relative uncertainty values per bin are also shown in grey, for reference. The bin corresponding to a tagger value of $-1$ is plotted at $-0.1$ . Statistical uncertainties are not shown.
png pdf	Figure 13: Contribution of each source of SF uncertainty, calculated as the square of the relative uncertainty in jet yield and expressed as the maximum of the up and down variations, at various values of the DeepJet CvsL (left) and CvsB (right) discriminators for c (upper), b (middle) and light (lower) flavours. The effective total relative uncertainty values per bin are also shown in grey, for reference. The bin corresponding to a tagger value of $-$ 1 is plotted at $-$ 0.1. Statistical uncertainties are not shown.
png pdf	Figure 13-a: Contribution of each source of SF uncertainty, calculated as the square of the relative uncertainty in jet yield and expressed as the maximum of the up and down variations, at various values of the DeepJet CvsL (left) and CvsB (right) discriminators for c (upper), b (middle) and light (lower) flavours. The effective total relative uncertainty values per bin are also shown in grey, for reference. The bin corresponding to a tagger value of $-$ 1 is plotted at $-$ 0.1. Statistical uncertainties are not shown.
png pdf	Figure 13-b: Contribution of each source of SF uncertainty, calculated as the square of the relative uncertainty in jet yield and expressed as the maximum of the up and down variations, at various values of the DeepJet CvsL (left) and CvsB (right) discriminators for c (upper), b (middle) and light (lower) flavours. The effective total relative uncertainty values per bin are also shown in grey, for reference. The bin corresponding to a tagger value of $-$ 1 is plotted at $-$ 0.1. Statistical uncertainties are not shown.
png pdf	Figure 13-c: Contribution of each source of SF uncertainty, calculated as the square of the relative uncertainty in jet yield and expressed as the maximum of the up and down variations, at various values of the DeepJet CvsL (left) and CvsB (right) discriminators for c (upper), b (middle) and light (lower) flavours. The effective total relative uncertainty values per bin are also shown in grey, for reference. The bin corresponding to a tagger value of $-$ 1 is plotted at $-$ 0.1. Statistical uncertainties are not shown.
png pdf	Figure 13-d: Contribution of each source of SF uncertainty, calculated as the square of the relative uncertainty in jet yield and expressed as the maximum of the up and down variations, at various values of the DeepJet CvsL (left) and CvsB (right) discriminators for c (upper), b (middle) and light (lower) flavours. The effective total relative uncertainty values per bin are also shown in grey, for reference. The bin corresponding to a tagger value of $-$ 1 is plotted at $-$ 0.1. Statistical uncertainties are not shown.
png pdf	Figure 13-e: Contribution of each source of SF uncertainty, calculated as the square of the relative uncertainty in jet yield and expressed as the maximum of the up and down variations, at various values of the DeepJet CvsL (left) and CvsB (right) discriminators for c (upper), b (middle) and light (lower) flavours. The effective total relative uncertainty values per bin are also shown in grey, for reference. The bin corresponding to a tagger value of $-$ 1 is plotted at $-$ 0.1. Statistical uncertainties are not shown.
png pdf	Figure 13-f: Contribution of each source of SF uncertainty, calculated as the square of the relative uncertainty in jet yield and expressed as the maximum of the up and down variations, at various values of the DeepJet CvsL (left) and CvsB (right) discriminators for c (upper), b (middle) and light (lower) flavours. The effective total relative uncertainty values per bin are also shown in grey, for reference. The bin corresponding to a tagger value of $-$ 1 is plotted at $-$ 0.1. Statistical uncertainties are not shown.
png pdf	Figure 14: ROC curves showing the individual performance of the CvsL (left) and CvsB (right) discriminators for the DeepCSV (blue) and DeepJet (red) algorithms for simulated jets (dashed lines) and the estimation of the same for jets in data (solid lines with uncertainty bands).
png pdf	Figure 14-a: ROC curves showing the individual performance of the CvsL (left) and CvsB (right) discriminators for the DeepCSV (blue) and DeepJet (red) algorithms for simulated jets (dashed lines) and the estimation of the same for jets in data (solid lines with uncertainty bands).
png pdf	Figure 14-b: ROC curves showing the individual performance of the CvsL (left) and CvsB (right) discriminators for the DeepCSV (blue) and DeepJet (red) algorithms for simulated jets (dashed lines) and the estimation of the same for jets in data (solid lines with uncertainty bands).
png pdf	Figure 15: ROC contours showing c-tagging efficiencies as a simultaneous function of b and light-flavour jet misidentification rate, for the DeepCSV (left) and DeepJet (right) algorithms for simulated jets (dashed lines) and the estimation of the same for jets in data (solid lines).
png pdf	Figure 15-a: ROC contours showing c-tagging efficiencies as a simultaneous function of b and light-flavour jet misidentification rate, for the DeepCSV (left) and DeepJet (right) algorithms for simulated jets (dashed lines) and the estimation of the same for jets in data (solid lines).
png pdf	Figure 15-b: ROC contours showing c-tagging efficiencies as a simultaneous function of b and light-flavour jet misidentification rate, for the DeepCSV (left) and DeepJet (right) algorithms for simulated jets (dashed lines) and the estimation of the same for jets in data (solid lines).
png pdf	Figure 16: Relative contributions of each source of uncertainty to the total uncertainty (statistical + systematic) for both CvsL and CvsB discrimination and for both DeepCSV and DeepJet taggers, quantified by the square of the change in area under ROC curves.
png pdf	Figure 17: Post-calibration DeepCSV CvsL (left) and CvsB (right) distributions of jet samples selected from W+c (upper), $\mathrm{t\bar{t}}$ semi- and dileptonic (middle), and DY+jet (lower) events after application of DeepCSV c-tagger shape calibration scale factors.
png pdf	Figure 17-a: Post-calibration DeepCSV CvsL (left) and CvsB (right) distributions of jet samples selected from W+c (upper), $\mathrm{t\bar{t}}$ semi- and dileptonic (middle), and DY+jet (lower) events after application of DeepCSV c-tagger shape calibration scale factors.
png pdf	Figure 17-b: Post-calibration DeepCSV CvsL (left) and CvsB (right) distributions of jet samples selected from W+c (upper), $\mathrm{t\bar{t}}$ semi- and dileptonic (middle), and DY+jet (lower) events after application of DeepCSV c-tagger shape calibration scale factors.
png pdf	Figure 17-c: Post-calibration DeepCSV CvsL (left) and CvsB (right) distributions of jet samples selected from W+c (upper), $\mathrm{t\bar{t}}$ semi- and dileptonic (middle), and DY+jet (lower) events after application of DeepCSV c-tagger shape calibration scale factors.
png pdf	Figure 17-d: Post-calibration DeepCSV CvsL (left) and CvsB (right) distributions of jet samples selected from W+c (upper), $\mathrm{t\bar{t}}$ semi- and dileptonic (middle), and DY+jet (lower) events after application of DeepCSV c-tagger shape calibration scale factors.
png pdf	Figure 17-e: Post-calibration DeepCSV CvsL (left) and CvsB (right) distributions of jet samples selected from W+c (upper), $\mathrm{t\bar{t}}$ semi- and dileptonic (middle), and DY+jet (lower) events after application of DeepCSV c-tagger shape calibration scale factors.
png pdf	Figure 17-f: Post-calibration DeepCSV CvsL (left) and CvsB (right) distributions of jet samples selected from W+c (upper), $\mathrm{t\bar{t}}$ semi- and dileptonic (middle), and DY+jet (lower) events after application of DeepCSV c-tagger shape calibration scale factors.
png pdf	Figure 18: Post-calibration DeepJet CvsL (left) and CvsB (right) distributions of jet samples selected from W+c (upper), $\mathrm{t\bar{t}}$ semi- and dileptonic (middle), and DY+jet (lower) events after application of DeepJet c-tagger shape calibration scale factors.
png pdf	Figure 18-a: Post-calibration DeepJet CvsL (left) and CvsB (right) distributions of jet samples selected from W+c (upper), $\mathrm{t\bar{t}}$ semi- and dileptonic (middle), and DY+jet (lower) events after application of DeepJet c-tagger shape calibration scale factors.
png pdf	Figure 18-b: Post-calibration DeepJet CvsL (left) and CvsB (right) distributions of jet samples selected from W+c (upper), $\mathrm{t\bar{t}}$ semi- and dileptonic (middle), and DY+jet (lower) events after application of DeepJet c-tagger shape calibration scale factors.
png pdf	Figure 18-c: Post-calibration DeepJet CvsL (left) and CvsB (right) distributions of jet samples selected from W+c (upper), $\mathrm{t\bar{t}}$ semi- and dileptonic (middle), and DY+jet (lower) events after application of DeepJet c-tagger shape calibration scale factors.
png pdf	Figure 18-d: Post-calibration DeepJet CvsL (left) and CvsB (right) distributions of jet samples selected from W+c (upper), $\mathrm{t\bar{t}}$ semi- and dileptonic (middle), and DY+jet (lower) events after application of DeepJet c-tagger shape calibration scale factors.
png pdf	Figure 18-e: Post-calibration DeepJet CvsL (left) and CvsB (right) distributions of jet samples selected from W+c (upper), $\mathrm{t\bar{t}}$ semi- and dileptonic (middle), and DY+jet (lower) events after application of DeepJet c-tagger shape calibration scale factors.
png pdf	Figure 18-f: Post-calibration DeepJet CvsL (left) and CvsB (right) distributions of jet samples selected from W+c (upper), $\mathrm{t\bar{t}}$ semi- and dileptonic (middle), and DY+jet (lower) events after application of DeepJet c-tagger shape calibration scale factors.
png pdf	Figure 19: DeepCSV CvsB (first row), DeepCSV CvsL (second row), DeepJet CvsB (third row) and DeepJet CvsL (last row) discriminators of soft-muon-bias-free semileptonic $\mathrm{t\bar{t}}$ jets, before (left) and after (right) application of scale factors.
png pdf	Figure 19-a: DeepCSV CvsB (first row), DeepCSV CvsL (second row), DeepJet CvsB (third row) and DeepJet CvsL (last row) discriminators of soft-muon-bias-free semileptonic $\mathrm{t\bar{t}}$ jets, before (left) and after (right) application of scale factors.
png pdf	Figure 19-b: DeepCSV CvsB (first row), DeepCSV CvsL (second row), DeepJet CvsB (third row) and DeepJet CvsL (last row) discriminators of soft-muon-bias-free semileptonic $\mathrm{t\bar{t}}$ jets, before (left) and after (right) application of scale factors.
png pdf	Figure 19-c: DeepCSV CvsB (first row), DeepCSV CvsL (second row), DeepJet CvsB (third row) and DeepJet CvsL (last row) discriminators of soft-muon-bias-free semileptonic $\mathrm{t\bar{t}}$ jets, before (left) and after (right) application of scale factors.
png pdf	Figure 19-d: DeepCSV CvsB (first row), DeepCSV CvsL (second row), DeepJet CvsB (third row) and DeepJet CvsL (last row) discriminators of soft-muon-bias-free semileptonic $\mathrm{t\bar{t}}$ jets, before (left) and after (right) application of scale factors.
png pdf	Figure 19-e: DeepCSV CvsB (first row), DeepCSV CvsL (second row), DeepJet CvsB (third row) and DeepJet CvsL (last row) discriminators of soft-muon-bias-free semileptonic $\mathrm{t\bar{t}}$ jets, before (left) and after (right) application of scale factors.
png pdf	Figure 19-f: DeepCSV CvsB (first row), DeepCSV CvsL (second row), DeepJet CvsB (third row) and DeepJet CvsL (last row) discriminators of soft-muon-bias-free semileptonic $\mathrm{t\bar{t}}$ jets, before (left) and after (right) application of scale factors.
png pdf	Figure 19-g: DeepCSV CvsB (first row), DeepCSV CvsL (second row), DeepJet CvsB (third row) and DeepJet CvsL (last row) discriminators of soft-muon-bias-free semileptonic $\mathrm{t\bar{t}}$ jets, before (left) and after (right) application of scale factors.
png pdf	Figure 19-h: DeepCSV CvsB (first row), DeepCSV CvsL (second row), DeepJet CvsB (third row) and DeepJet CvsL (last row) discriminators of soft-muon-bias-free semileptonic $\mathrm{t\bar{t}}$ jets, before (left) and after (right) application of scale factors.
png pdf	Figure 20: DeepCSV CvsB (first row), DeepCSV CvsL (second row), DeepJet CvsB (third row) and DeepJet CvsL (fourth row) discriminators of soft-muon-bias-free dileptonic $\mathrm{t\bar{t}}$ jets, before (left) and after (right) application of scale factors.
png pdf	Figure 20-a: DeepCSV CvsB (first row), DeepCSV CvsL (second row), DeepJet CvsB (third row) and DeepJet CvsL (fourth row) discriminators of soft-muon-bias-free dileptonic $\mathrm{t\bar{t}}$ jets, before (left) and after (right) application of scale factors.
png pdf	Figure 20-b: DeepCSV CvsB (first row), DeepCSV CvsL (second row), DeepJet CvsB (third row) and DeepJet CvsL (fourth row) discriminators of soft-muon-bias-free dileptonic $\mathrm{t\bar{t}}$ jets, before (left) and after (right) application of scale factors.
png pdf	Figure 20-c: DeepCSV CvsB (first row), DeepCSV CvsL (second row), DeepJet CvsB (third row) and DeepJet CvsL (fourth row) discriminators of soft-muon-bias-free dileptonic $\mathrm{t\bar{t}}$ jets, before (left) and after (right) application of scale factors.
png pdf	Figure 20-d: DeepCSV CvsB (first row), DeepCSV CvsL (second row), DeepJet CvsB (third row) and DeepJet CvsL (fourth row) discriminators of soft-muon-bias-free dileptonic $\mathrm{t\bar{t}}$ jets, before (left) and after (right) application of scale factors.
png pdf	Figure 20-e: DeepCSV CvsB (first row), DeepCSV CvsL (second row), DeepJet CvsB (third row) and DeepJet CvsL (fourth row) discriminators of soft-muon-bias-free dileptonic $\mathrm{t\bar{t}}$ jets, before (left) and after (right) application of scale factors.
png pdf	Figure 20-f: DeepCSV CvsB (first row), DeepCSV CvsL (second row), DeepJet CvsB (third row) and DeepJet CvsL (fourth row) discriminators of soft-muon-bias-free dileptonic $\mathrm{t\bar{t}}$ jets, before (left) and after (right) application of scale factors.
png pdf	Figure 20-g: DeepCSV CvsB (first row), DeepCSV CvsL (second row), DeepJet CvsB (third row) and DeepJet CvsL (fourth row) discriminators of soft-muon-bias-free dileptonic $\mathrm{t\bar{t}}$ jets, before (left) and after (right) application of scale factors.
png pdf	Figure 20-h: DeepCSV CvsB (first row), DeepCSV CvsL (second row), DeepJet CvsB (third row) and DeepJet CvsL (fourth row) discriminators of soft-muon-bias-free dileptonic $\mathrm{t\bar{t}}$ jets, before (left) and after (right) application of scale factors.
png pdf	Figure 21: Distribution of the SF pulls, quantified as the differences between the injected SFs and the SFs retrieved by the fits in units of the statistical uncertainties in the latter ( $\frac {\mathrm {SF}_{\mathrm {extracted}}-\mathrm {SF}_{\mathrm {injected}}}{\sigma _{\mathrm {extracted}}}$ ), across all bins in the CvsB-CvsL plane, for the SF map with "mild" (left) and "strong" (right) SFs.
png pdf	Figure 21-a: Distribution of the SF pulls, quantified as the differences between the injected SFs and the SFs retrieved by the fits in units of the statistical uncertainties in the latter ( $\frac {\mathrm {SF}_{\mathrm {extracted}}-\mathrm {SF}_{\mathrm {injected}}}{\sigma _{\mathrm {extracted}}}$ ), across all bins in the CvsB-CvsL plane, for the SF map with "mild" (left) and "strong" (right) SFs.
png pdf	Figure 21-b: Distribution of the SF pulls, quantified as the differences between the injected SFs and the SFs retrieved by the fits in units of the statistical uncertainties in the latter ( $\frac {\mathrm {SF}_{\mathrm {extracted}}-\mathrm {SF}_{\mathrm {injected}}}{\sigma _{\mathrm {extracted}}}$ ), across all bins in the CvsB-CvsL plane, for the SF map with "mild" (left) and "strong" (right) SFs.

Tables
png pdf	Table 1: Summary of the heavy-flavour tagging definitions for both b- and c-tagging using the DeepCSV and DeepJet taggers.
png pdf	Table 2: The combined jet yield and contribution of each flavour of jet to each selection is shown. The number of events is reported from data, while the per-flavour contribution is determined from simulation. The purity of each selection (row) is highlighted in bold text.

Summary

This note presents a novel method to calibrate the full differential shape of the discriminator distributions used for c jet identification at CMS. The method uses three different sets of event selection criteria, targeting topologies enriched in W+c, top quark pairs and DY+jet events. These topologies are highly enriched in c, b and light-flavour jets respectively, resulting in purities of a given jet-flavour that range between 81% and 93%. By employing an iterative fitting approach in each of these three regions, scale factors are derived to match the simulated discriminator distributions to those observed in data. Given that the c-tagging algorithm is composed of two discriminators, one to discriminate c from b jets (CvsB) and another to discriminate c from light-flavour and gluon jets (CvsL), the scale factors are derived as a function of the two-dimensional CvsL and CvsB discriminator values. An adaptive binning is adopted to optimise the granularity of the provided calibration with respect to the statistical uncertainty in each bin.

We present validation and closure tests that validate the robustness of the method. Although not reported here, the method has also been demonstrated to work in the context of a search for associated production of a Higgs boson with a vector boson, where the Higgs boson decays into a pair of charm quarks [9]. The calibration of the full differential discriminator shape allows to use the c-tagging discriminators as an input to multivariate techniques (based on machine learning) or by fitting the discriminator shapes to data to extract observables which are sensitive to the jet-flavour. The shape-calibration extends the use of the c-tagging algorithms beyond the application of discrete working points, and will result in more advanced use-cases for c jet identification in physics analyses.

References
1	CMS Collaboration	Identification of b-quark jets with the CMS experiment	JINST 8 (2013) P04013	CMS-BTV-12-001 1211.4462
2	CMS Collaboration	Identification of heavy-flavour jets with the CMS detector in pp collisions at 13 TeV	JINST 13 (2018) P05011	CMS-BTV-16-002 1712.07158
3	CMS Collaboration	Identification of c-quark jets at the CMS experiment	CMS-PAS-BTV-16-001	CMS-PAS-BTV-16-001
4	E. Bols et al.	Jet flavour classification using DeepJet	JINST 15 (2020) P12012	2008.10519
5	CMS Collaboration	Performance of the DeepJet b tagging algorithm using 41.9/fb of data from proton-proton collisions at 13 TeV with Phase 1 CMS detector	CDS
6	D. Guest et al.	Jet flavor classification in high-energy physics with deep neural networks	Physical Review D 94 (2016)	1607.08633
7	CMS Collaboration	Observation of Higgs boson decay to bottom quarks	PRL 121 (2018) 121801	CMS-HIG-18-016 1808.08242
8	S. Moortgat	When charm and beauty adjoin the top. First measurement of the cross section of top quark pair production with additional charm jets with the CMS experiment	PhD thesis, Vrije U., Brussels, May, 2019 CERN-THESIS-2019-051
9	CMS Collaboration	A search for the standard model Higgs boson decaying to charm quarks	JHEP 03 (2020) 131	CMS-HIG-18-031 1912.01662
10	CMS Collaboration	CMS technical design report for the pixel detector upgrade	CDS
11	CMS Collaboration	The CMS experiment at the CERN LHC	JINST 3 (2008) S08004	CMS-00-001
12	CMS Collaboration	The CMS trigger system	JINST 12 (2017) P01020	CMS-TRG-12-001 1609.02366
13	CMS Collaboration	CMS luminosity measurement for the 2017 data-taking period at $\sqrt{s} =$ 13 TeV
14	T. Sjostrand et al.	An introduction to PYTHIA 8.2	Computer Physics Communications 191 (2015) 159
15	P. Skands, S. Carrazza, and J. Rojo	Tuning PYTHIA 8.1: the Monash 2013 Tune	EPJC 74 (2014) 3024	1404.5630
16	NNPDF Collaboration	Parton distributions for the LHC Run II	JHEP 04 (2015) 040	1410.8849
17	P. Nason	A new method for combining NLO QCD with shower Monte Carlo algorithms	JHEP 11 (2004) 040	hep-ph/0409146
18	S. Frixione, P. Nason, and C. Oleari	Matching NLO QCD computations with parton shower simulations: the POWHEG method	JHEP 11 (2007) 070	0709.2092
19	S. Alioli, P. Nason, C. Oleari, and E. Re	A general framework for implementing NLO calculations in shower Monte Carlo programs: the POWHEG BOX	JHEP 06 (2010) 043	1002.2581
20	J. M. Campbell, R. K. Ellis, P. Nason, and E. Re	Top-pair production and decay at NLO matched with parton showers	JHEP 04 (2015) 114	1412.1828
21	J. Alwall et al.	The automated computation of tree-level and next-to-leading order differential cross sections, and their matching to parton shower simulations	JHEP 07 (2014) 079	1405.0301
22	J. Alwall et al.	Comparative study of various algorithms for the merging of parton showers and matrix elements in hadronic collisions	EPJC 53 (2008) 473	0706.2569
23	Y. Li and F. Petriello	Combining QCD and electroweak corrections to dilepton production in FEWZ	PRD 86 (2012) 094034	1208.5967
24	S. Frixione et al.	Single-top hadroproduction in association with a W boson	JHEP 07 (2008) 029	0805.3067
25	E. Re	Single-top Wt-channel production matched with parton showers using the POWHEG method	EPJC 71 (2011) 1547	1009.2450
26	N. Kidonakis	NNLL threshold resummation for top-pair and single-top production	Phys. Part. Nucl. 45 (2014) 714	1210.7813
27	J. M. Campbell and R. K. Ellis	MCFM for the Tevatron and the LHC	NPPS 205-206 (2010) 10	1007.3492
28	T. Gehrmann et al.	W $^+$ W $^-$ production at hadron colliders in next to next to leading order QCD	PRL 113 (2014) 212001	1408.5243
29	S. Agostinelli et al.	GEANT4---a simulation toolkit	NIM A 506 (2003) 250 -- 303
30	CMS Collaboration	Particle-flow reconstruction and global event description with the CMS detector	JINST 12 (2017) P10003	CMS-PRF-14-001 1706.04965
31	CMS Collaboration	Performance of missing transverse momentum reconstruction in proton-proton collisions at $\sqrt{s} =$ 13 TeV using the CMS detector	JINST 14 (2019) P07004	CMS-JME-17-001 1903.06078
32	M. Cacciari, G. P. Salam, and G. Soyez	The anti- $k_t$ jet clustering algorithm	JHEP 04 (2008) 063	0802.1189
33	M. Cacciari, G. P. Salam, and G. Soyez	FastJet user manual	EPJC 72 (2012) 1896	1111.6097
34	CMS Collaboration	Jet energy scale and resolution in the CMS experiment in pp collisions at 8 TeV	JINST 12 (2017) P02014	CMS-JME-13-004 1607.03663
35	CMS Collaboration	Pileup mitigation at CMS in 13 TeV data	JINST 15 (2020), no. arXiv:2003.00503. CMS-JME-18-001-003. 09, P09018. 57 p, . Submitted to JINST
36	CMS Collaboration	Measurement of $B\bar{B}$ angular correlations based on secondary vertex reconstruction at $\sqrt{s}=$ 7 TeV	JHEP 03 (2011) 136	CMS-BPH-10-010 1102.3194
37	CMS Collaboration	Measurement of associated production of a W boson and a charm quark in proton-proton collisions at $\sqrt{s} =$ 13 TeV	The European Physical Journal C 79 (2019) 269	CMS-SMP-17-014 1811.10021
38	CMS Collaboration	Electron and photon reconstruction and identification with the CMS experiment at the CERN LHC	Accepted by JINST	CMS-EGM-17-001 2012.06888
39	CMS Collaboration	Performance of the CMS muon detector and muon reconstruction with proton-proton collisions at $\sqrt{s}=$ 13 TeV	JINST 13 (2018) P06015	CMS-MUO-16-001 1804.04528
40	CMS Collaboration	Performance of the reconstruction and identification of high-momentum muons in proton-proton collisions at $\sqrt{s} =$ 13 TeV	JINST 15 (2020) P02027	CMS-MUO-17-001 1912.03516
41	CMS Collaboration	Measurement of the inelastic proton-proton cross section at $\sqrt{s} =$ 13 TeV	JHEP 07 (2018) 161	CMS-FSQ-15-005 1802.02613
42	M. G. Bowler	e+ e- production of heavy quarks in the string model	Z. Phys. C 11 (1981) 169
43	B. Andersson, G. Gustafson, G. Ingelman, and T. Sjostrand	Parton fragmentation and string dynamics	PR 97 (1983) 31
44	T. Sjostrand	Jet fragmentation of nearby partons	NPB 248 (1984) 469
45	ALEPH Collaboration	Study of the fragmentation of b quarks into B mesons at the Z peak	PLB 512 (2001) 30	hep-ex/0106051
46	DELPHI Collaboration	A study of the b quark fragmentation function with the DELPHI detector at LEP I and an averaged distribution obtained at the Z pole	EPJC 71 (2011) 1557	1102.4748
47	OPAL Collaboration	Inclusive analysis of the b quark fragmentation function in Z decays at LEP	EPJC 29 (2003) 463	hep-ex/0210031
48	SLD Collaboration	Measurement of the b quark fragmentation function in ${\mathrm{Z^0}}$ decays	PRD 65 (2002) 092006	hep-ex/0202031
49	CMS Collaboration	Measurement of the associated production of a Z boson with charm or bottom quark jets in proton-proton collisions at $\sqrt{s} =$ 13 TeV	Physical Review D 102 (2020)	CMS-SMP-19-004 2001.06899