Maximum likelihood estimation

The central method behind the ctools data analysis is the maximum likelihood estimation of the parameters of a given model. The method obtains the parameter estimates by finding the parameter values that maximize the likelihood function \(L(M)\). The likelihood function quantifies the probability that the data are drawn from a particular model \(M\). The formulae used for the likelihood computation depends on whether the data are unbinned or binned and on the assumed underlying statistical law. An iterative Levenberg-Marquardt algorithm is employed for maximum likelihood estimation.

Unbinned data

For unbinned data the Poisson formula

\[-\ln L(M) = E(M) - \sum_i \ln P(p'_i, E'_i ,t'_i | M)\]

is used, where \(\ln L(M)\) is the so-called log-likelihood function and the sum is taken over all events \(i\), characterised by the instrument direction \(p'_i\), the measured energy \(E'_i\) and the trigger time \(t'_i\). \(P(p', E' ,t' | M)\) is the probability density that given the model \(M\), an event with instrument direction \(p'\), measured energy \(E'\) and trigger time \(t'\) occurs. \(E(M)\) is the total number of events that are predicted to occur during an observation given the model \(M\), computed by integrating the probability density over the trigger time, measured energy and instrument direction:

\[E(M) = \int_{GTI} \int_{Ebounds} \int_{ROI} P(p',E',t' | M) \, dp' \, dE' \, dt'\]

The temporal integration boundaries are defined by the Good Time Intervals (GTIs) that define contiguous periods in time during which data were taken. The spatial integration boundaries are defined by a so-called Region of Interest (ROI).

Binned data

For binned data following a Poisson distribution the formula

\[-\ln L(M) = \sum_i e_i(M) - n_i \ln e_i(M)\]

is used, where the sum over \(i\) is now taken over all data cube bins. \(n_i\) is the number of events observed in bin \(i\), and

\[e_i(M) = P(p'_i, E'_i ,t'_i | M) \times \Omega_i \times \Delta E_i \times \Delta T_i\]

is the predicted number of events from model \(M\) in bin \(i\). The probability density is evaluated for the reference instrument direction \(p'_i\), measured energy \(E'_i\) and trigger time \(t'_i\) of bin \(i\), taken to be the values at the bin centre, and multiplied by the solid angle \(\Omega_i\), the energy width \(\Delta E_i\) and the exposure time (or ontime) \(\Delta T_i\) of bin \(i\) (note that a live time correction is included in the probability density \(P(p'_i, E'_i ,t'_i | M)\) so that the multiplication is done by the true exposure time).

Binned data following Gaussian statistic

Alternatively, if the data follow a Gaussian distribution the formula

\[-\ln L(M) = \frac{1}{2} \sum_i \left( \frac{n_i - e_i(M)}{\sigma_i} \right)^2\]

is used, where \(\sigma_i\) is the statistical uncertainty in the measured number of events for bin \(i\).

On/Off analysis

For a so-called On/Off analysis, where spectra from a source (or On) region and source-free (or Off) region are analysed jointly, there exist two options: cstat and wstat.

cstat

cstat makes use of a background model \(M_b\) and employs the Poisson formula

\[-\ln L(M) = \sum_i s_i(M_s) + \alpha_i(M_b) \, b_i(M_b) - n^\mathrm{on}_i \ln [s_i(M_s)+ \alpha_i(M_b) \, b_i(M_b)] + b_i(M_b) - n^\mathrm{off}_i \ln b_i(M_b)\]

where the model \(M\) is split into signal (i.e. gamma rays) and background, i.e. \(M = M_s + M_b\), \(n^\mathrm{on}_i\) is the number of events in bin \(i\) of the On region, \(n^\mathrm{off}_i\) is the number of events in bin \(i\) of the Off region, \(s_i(M_s)\) is the number of expected signal counts in bin \(i\) of the On region, \(b_i(M_b)\) is the number of expected background counts in bin \(i\) of the Off region, and

\[\alpha_i(M_b) = \frac{\int_\mathrm{on} M_b d\Omega}{\int_\mathrm{off} M_b d\Omega}\]

is the ratio between the spatial integral over the background model in the On region and the Off region for bin \(i\).

wstat

wstat does not make use of an explicit background model but assumes that the background rate per solid angle is the same in the On and the Off region. The Poisson formula is then

\[\begin{split}-\ln L(M_s) = \sum_i s_i(M_s) + \alpha b_i(M_s) - n^\mathrm{on}_i \ln [s_i(M_s)+ \alpha \, b_i(M_s)] + b_i(M_s) - n^\mathrm{off}_i \ln b_i(M_s) -\\ n^\mathrm{on}_i (1-\ln n^\mathrm{on}_i) - n^\mathrm{off}_i (1-\ln n^\mathrm{off}_i)\end{split}\]

where

\[\alpha = \frac{\int_\mathrm{on} d\Omega}{\int_\mathrm{off} d\Omega}\]

is the ratio between the solid angles of the On region and the Off region. The terms in the last row are added so that \(2 \ln L(M_s)\) follows a \(\chi^2\) distribution.

Some special cases need to be handled separately in wstat. If \(n^\mathrm{on}_i = 0\) but \(n^\mathrm{off}_i > 0\) the contribution to the log-likelihood from the energy bin \(i\) is

\[-\ln L_i(M_s) = s_i(M_s) + n^\mathrm{off}_i \ln(\alpha+1).\]

If \(n^\mathrm{off}_i = 0\) and \(n^\mathrm{on}_i > s_i(M_s) \frac{\alpha + 1}{\alpha}\) the contribution to the log-likelihood from the energy bin \(i\) is

\[-\ln L_i(M_s) = -\frac{s_i(M_s)}{\alpha} - n^\mathrm{on}_i \ln\left(\frac{\alpha}{\alpha+1}\right)\]

However, for smaller \(n^\mathrm{on}_i\) the value of \(b_i(M_s)\) is null or negative. Since a negative number of background counts is unphysical, the number of background counts is forced to be zero. This yields the following expression for the log-likelihood in the energy bin \(i\):

\[-\ln L_i(M_s) = s_i(M_s) + n^\mathrm{on}_i \left( \ln n^\mathrm{on}_i - \ln s_i(M_s) - 1 \right)\]

or, if also \(n^\mathrm{on}_i = 0\),

\[-\ln L_i(M_s) = s_i(M_s)\]

Forcing the number of expected background counts to zero biases the likelihood estimator. Therefore, wstat is known to be inaccurate if there are energy bins with zero Off counts.

Levenberg-Marquardt algorithm

ctools uses an iterative Levenberg-Marquardt algorithm for maximum likelihood estimation. Since the Levenberg-Marquardt algorithm minimises a function, we use \(-\ln L(M)\) as the function to minimse. The Levenberg-Marquardt algorithm starts with an initial guess of the model parameters \(a_k\) and iteratively replaces this estimate by a new estimate \(a_k + \Delta a_k\). The \(\Delta a_k\) are determined by solving

\[\sum_l \alpha_{kl} (1 + \delta_{kl} \lambda) \Delta a_l = \beta_k\]

where

\[\alpha_{kl} = \frac{\partial^2 (-\ln L(M))}{\partial a_k \partial a_l}\]

is the curvature matrix,

\[\beta_k = \frac{\partial (-\ln L(M))}{\partial a_k}\]

is the gradient, and \(\delta_{kl}\) is the Kronecker delta that is \(1\) for \(k=l\) and \(0\) otherwise. \(\lambda\) is a damping parameter that initially is set to 0.001. If a Levenberg-Marquardt iteration leads to an increase of the log-likelihood function \(\ln L(M)\), \(\lambda\) is decreased by a factor of 10. If the log-likelihood function \(\ln L(M)\) does not improve, \(\lambda\) is increased by a factor of 10 and the iteration is repeated. The iterations stop when the log-likelihood increase is less than a small value, typically 0.005; the optimiser status is then set to converged. The iterations are also stopped if the log-likelihood function does not increase for (typically) ten iterations; the optimiser status is then set to stalled. The matrix equation is solved using a sparse matrix Cholesky decomposition. Parameters are constrained within their parameter limits in case they exist.

Model fitting using the Levenberg-Marquardt algorithm is implemented by ctlike.

Note

Computation of \(\alpha_{kl}\) and \(\beta_k\) implies computation of derivatives of \(P(p',E',t' | M)\) with respect to the parameters of the model \(M\). According to Numerical Recipes (section 15.5.1) second derivatives in \(P(p',E',t' | M)\) are neglected in ctools, which implies that only first derivatives

\[\frac{\partial P(p',E',t' | M)}{\partial a_k} = \frac{\partial P(p',E',t' | M)}{\partial M} \frac{\partial M}{\partial a_k}\]

are computed. If possible, derivates are computed analytically (for example for spectral model parameters), otherwise numerical derivatives are used (for example for spatial or temporal model parameters).

Statistical parameter errors

Statistical errors on the model parameters \(\delta a_k\) are determined by computing the square root of the diagonal elements of the covariance matrix \(C\) which is the inverse of the curvature matrix:

\[\delta a_k = \sqrt{C_{kk}}\]

with

\[C = [\alpha]^{-1}\]

Inversion of \([\alpha]\) is again performed using a sparse matrix Cholesky decomposition.

Detection significance

The detection significance of the source model is estimated using the so called Test Statistic (TS) which is defined as

\[\mathrm{TS} = 2 \, ( \ln L(M_s+M_b) - \ln L(M_b) )\]

where \(\ln L(M_s+M_b)\) is the log-likelihood value obtained when fitting the source and the background together to the data, and \(\ln L(M_b)\) is the log-likelihood value obtained when fitting only the background model to the data. Under the hypothesis that the model \(M_b\) provides a satisfactory fit of the data, \(\mathrm{TS}\) follows a \(\chi^2_n\) distribution with \(n\) degrees of freedom, where \(n\) is the number of free parameters in the source model component. Therefore

\[p = \int_\mathrm{TS}^{+\infty} \chi^2_n(x) \:\: \mathrm{d}x\]

gives the chance probability (p-value) that the log-likelihood improves by \(\mathrm{TS}/2\) when adding the source model \(M_s\) due to statistical fluctuations only. For \(n=1\) the significance in Gaussian sigma is given by \(\sqrt{\mathrm{TS}}\).

Upper flux limits

If gamma-ray emission from a source is not detected, an upper flux limit can be derived by determining the flux \(F_\mathrm{up}\) that leads to a log-likelihood decrease of \(\Delta \ln L\) with respect to the maximum log-likelihood estimate \(F_\mathrm{0}\):

\[\ln L(F_\mathrm{up}) = \ln L(F_\mathrm{0}) - \Delta \ln L\]

The log-likelihood decrease \(\Delta \ln L\) is computed from the chance probability (p-value) using

\[\Delta \ln L = (\mathrm{erf}^{-1}(p))^2\]

Upper limit computation is implemented by ctulimit.