Using ctools from Python

Download Notebook

In this notebook you will learn how to use the ctools and cscripts from Python instead of typing the commands in the console.

ctools provides two Python modules that allow using all tools and scripts as Python classes. To use ctools from Python you have to import the ctools and cscripts modules into Python. You should also import the gammalib module, as ctools without GammaLib is generally not very useful.

Warning: Always import gammalib before you import ctools and cscripts.

In [1]:
import gammalib
import ctools
import cscripts

Simulating events

As an example we will simulate an observation of an hour of the Crab nebula.

In [2]:
sim = ctools.ctobssim()
sim['inmodel']   = '${CTOOLS}/share/models/crab.xml'
sim['outevents'] = 'events.fits'
sim['caldb']     = 'prod2'
sim['irf']       = 'South_0.5h'
sim['ra']        = 83.5
sim['dec']       = 22.8
sim['rad']       = 5.0
sim['tmin']      = '2020-01-01T00:00:00'
sim['tmax']      = '2020-01-01T01:00:00'
sim['emin']      = 0.03
sim['emax']      = 150.0
sim.execute()

The first line generates an instance of the ctobssim tool as a Python class. User parameters are then set using the [ ] operator. After setting all parameters the execute() method is called to execute the ctobssim tool. On output the events.fits FITS file is created. Until now everything is analogous to running the tool from the command line, but in Python you can easily combine the different blocks into more complex workflows.

Remember that you can consult the manual of each tool to find out how it works and to discover all the input parameters that you can set.

In [3]:
!ctobssim --help

ctobssim
========

Simulate event list(s).


Synopsis
--------

This tool simulates event list(s) using the instrument characteristics
specified by the instrument response function(s) and an input model. The
simulation includes photon events from astrophysical sources and background
events from an instrumental background model.

By default, ctobssim creates a single event list. ctobssim queries a pointing
direction, the radius of the simulation region, a time interval, an energy
interval, an instrumental response function, and an input model. ctobssim uses
a numerical random number generator for the simulations with a seed value
provided by the hidden seed parameter. Changing this parameter for
subsequent runs will lead to different event samples.

ctobssim performs a safety check on the maximum photon rate for all model
components to avoid that the tool locks up and requests huge memory
resources, which may happen if a mistake was made in setting up the input
model (for example if an error in the flux units is made). The maximum allowed
photon rate is controlled by the hidden maxrate parameter, which by default
is set to 1e6.

ctobssim can also generate multiple event lists if an observation definition
file is specified on input using the hidden inobs parameter. In that
case, simulation information will be gathered from the file, and for each
observation an event list will be created.

For each event file, the simulation parameters will be written as data
selection keywords to the FITS header. These keywords are mandatory for any
unbinned maximum likelihood analysis of the event data.


General parameters
------------------

inobs [file]
    Input event list or observation definition XML file. If provided (i.e. the
    parameter is not blank or NONE), the pointing definition and eventually the
    response information will be extracted from the input file for event
    simulation.

inmodel [file]
    Input model XML file.

caldb [string]
    Calibration database.

irf [string]
    Instrumental response function.

(edisp = no) [boolean]
    Apply energy dispersion?

outevents [file]
    Output event list or observation definition XML file.

(prefix = sim_events_) [string]
    Prefix for event list in observation definition XML file.

(startindex = 1) [integer]
    Start index of event list in observation definition XML file.

(seed = 1) [integer]
    Integer seed value to be used for Monte Carlo simulations. Keep this
    parameter at the same value for repeatable simulations, or increment
    this value for subsequent runs if non-repeatable simulations are
    required.

ra [real]
    Right Ascension of CTA pointing (J2000, in degrees).

dec [real]
    Declination of CTA pointing (J2000, in degrees).

rad [real]
    Radius of CTA field of view (simulation cone radius) (in degrees).

tmin [time]
    Start time (UTC string, JD, MJD or MET in seconds).

tmax [time]
    Stop time (UTC string, JD, MJD or MET in seconds).

(mjdref = 51544.5) [real]
    Reference Modified Julian Day (MJD) for simulated events. The times in
    seconds for each event are counted from this reference time on.

emin [real]
    Lower energy limit of simulated events (in TeV).

emax [real]
    Upper energy limit of simulated events (in TeV).

(deadc = 0.95) [real]
    Average deadtime correction factor.

(maxrate = 1.0e6) [real]
    Maximum photon rate for source models. Source models that exceed this
    maximum photon rate will lead to an exception as very likely the
    specified model normalisation is too large (probably due to the
    a misinterpretation of units). Note that ctools specifies intensity
    units per MeV.


Standard parameters
-------------------

(nthreads = 0) [integer]
    Number of parallel processes (0=use all available CPUs).

(publish = no) [boolean]
    Specifies whether the event list(s) should be published on VO Hub.

(chatter = 2) [integer]
    Verbosity of the executable:
     chatter = 0: no information will be logged

     chatter = 1: only errors will be logged

     chatter = 2: errors and actions will be logged

     chatter = 3: report about the task execution

     chatter = 4: detailed report about the task execution

(clobber = yes) [boolean]
    Specifies whether existing files should be overwritten.

(debug = no) [boolean]
    Enables debug mode. In debug mode the executable will dump any log file output to the console.

(mode = ql) [string]
    Mode of automatic parameters (default is ql, i.e. "query and learn").

(logfile = ctobssim.log) [string]
    Name of log file.


Related tools or scripts
------------------------

csobsdef

In a Jupyter notebook a code line starting with ! is executed in the shell, so you can do the operation above just from the command line.

One of the advantages of using ctools from Python is that you can run a tool using

In [4]:
sim.run()

The main difference to the execute() method is that the run() method will not write the output (i.e., the simulated event list) to disk. Why is this useful? Well, after having typed sim.run() the ctobssim class still exists as an object in memory, including all the simulated events. You can always save to disk later using the save() method.

The ctobssim class has an obs() method that returns an observation container that holds the simulated CTA observation with its associated events. To visualise this container, type:

In [5]:
print(sim.obs())
=== GObservations ===
 Number of observations ....: 1
 Number of models ..........: 2
 Number of observed events .: 204568
 Number of predicted events : 0

There is one CTA observation in the container and to visualise that observation type:

In [6]:
print(sim.obs()[0])
=== GCTAObservation ===
 Name ......................:
 Identifier ................: 000001
 Instrument ................: CTA
 Event file ................: events.fits
 Event type ................: EventList
 Statistic .................: cstat
 Ontime ....................: 3599.99999976158 s
 Livetime ..................: 3527.99999976635 s
 Deadtime correction .......: 0.98
 User energy range .........: undefined
=== GCTAPointing ===
 Pointing direction ........: (RA,Dec)=(83.5,22.8)
=== GCTAResponseIrf ===
 Caldb mission .............: cta
 Caldb instrument ..........: prod2
 Response name .............: South_0.5h
 Energy dispersion .........: Not used
 Safe energy range .........: undefined
=== GCTAEventList ===
 Number of events ..........: 204568 (disposed in "events.fits")
 Time interval .............: 58849.0008007407 - 58849.0424674074 days
 Energy interval ...........: 0.03 - 150 TeV
 Region of interest ........: RA=83.5, DEC=22.8 [0,0] Radius=5 deg
=== GSkyRegions ===
 Number of regions .........: 0

The observation contains a CTA event list that is implement by the GammaLib class GCTAEventList. You can access the event list using the events() method. To visualise the individual events you can iterate over the events using a for loop. This will show the simulated celestial coordinates (RA, DEC), the coordinate in the camera system [DETX, DETY], the energies and the terrestrial times (TT) of all events. Let’s peek at the first events of the list:

In [7]:
events = sim.obs()[0].events()
for event in events[:1]:
    print(event)
Dir=RA=83.7821807861328, DEC=22.0288944244385 [-0.0134540141482956,0.00456559645656115] Energy=36.9588024914265 GeV Time=315532804.286922 s (TT)

We can use this feature to inspect some of the event properties, for example look at their energy spectrum. For this we will use the Python packages matplotlib. If you do not have matplotlib you can use another plotting package of your choice or skip this step.

In [8]:
import matplotlib.pyplot as plt
%matplotlib inline
#this will visualize plots inline

ax = plt.subplot()
ax.set_yscale('log')
ax.set_xlabel('Log10(Energy [TeV])')

energies = []
for event in events:
    energies.append(event.energy().log10TeV())

n, bins, patches = plt.hist(energies)
../../../_images/users_tutorials_quickstart_ctools-from-python_17_0.png

Fitting a model to the observations

We can use the observation in memory to directly run a likelihood fit.

In [9]:
like = ctools.ctlike(sim.obs())
like.run()

This is very compact. Where do we define the model fit to the data? Where are the user parameters? ctlike doesn’t in fact need any parameters as all the relevant information is already contained in the observation container produced by the ctobssim class. Indeed, you constructed the ctlike instance by using the ctobssim observation container as constructor argument.

An observation container, implemented by the GObservations class of GammaLib, is the fundamental brick of any ctools analysis. An observation container can hold more than events, e.g., in this case it also holds the model that was used to generate the events.

Many tools and scripts handle observation containers, and accept them upon construction and return them after running the tool via the obs() method. Passing observation containers between ctools classes is a very convenient and powerful way of building in-memory analysis pipelines. However, this implies that you need some computing ressources when dealing with large observation containers (for example if you want to analyse a few 100 hours of data at once). Also, if the script crashes the information is lost.

To check how the fit went you can inspect the optimiser used by ctlike by typing:

In [10]:
print(like.opt())
=== GOptimizerLM ===
 Optimized function value ..: 766517.033
 Absolute precision ........: 0.005
 Acceptable value decrease .: 2
 Optimization status .......: converged
 Number of parameters ......: 10
 Number of free parameters .: 4
 Number of iterations ......: 2
 Lambda ....................: 1e-05

You see that the fit converged after 2 iterations. Out of 10 parameters in the model 4 have been fitted (the others were kept fixed). To inspect the fit results you can print the model container that can be access using the models() method of the observation container:

In [11]:
print(like.obs().models())
=== GModels ===
 Number of models ..........: 2
 Number of parameters ......: 10
=== GModelSky ===
 Name ......................: Crab
 Instruments ...............: all
 Observation identifiers ...: all
 Model type ................: PointSource
 Model components ..........: "PointSource" * "PowerLaw" * "Constant"
 Number of parameters ......: 6
 Number of spatial par's ...: 2
  RA .......................: 83.6331 [-360,360] deg (fixed,scale=1)
  DEC ......................: 22.0145 [-90,90] deg (fixed,scale=1)
 Number of spectral par's ..: 3
  Prefactor ................: 5.71641983396841e-16 +/- 6.16044728001869e-18 [1e-23,1e-13] ph/cm2/s/MeV (free,scale=1e-16,gradient)
  Index ....................: -2.47207578184696 +/- 0.00792255637026554 [-0,-5]  (free,scale=-1,gradient)
  PivotEnergy ..............: 300000 [10000,1000000000] MeV (fixed,scale=1000000,gradient)
 Number of temporal par's ..: 1
  Normalization ............: 1 (relative value) (fixed,scale=1,gradient)
 Number of scale par's .....: 0
=== GCTAModelIrfBackground ===
 Name ......................: CTABackgroundModel
 Instruments ...............: CTA
 Observation identifiers ...: all
 Model type ................: "PowerLaw" * "Constant"
 Number of parameters ......: 4
 Number of spectral par's ..: 3
  Prefactor ................: 0.999547413672001 +/- 0.00722086618569711 [0.001,1000] ph/cm2/s/MeV (free,scale=1,gradient)
  Index ....................: -0.000149454940530402 +/- 0.00255125105744512 [-5,5]  (free,scale=1,gradient)
  PivotEnergy ..............: 1000000 [10000,1000000000] MeV (fixed,scale=1000000,gradient)
 Number of temporal par's ..: 1
  Normalization ............: 1 (relative value) (fixed,scale=1,gradient)

For example, in this way we can fetch the minimum of the optimized function (the negative of the natural logarithm of the likelihood) to compare different model hypotheses.

In [12]:
like1 = like.opt().value()
print(like1)
766517.033411

Suppose you want to repeat the fit by optimising also the position of the point source. This is easy from Python, as you can modify the model and fit interactively. Type the following:

In [13]:
like.obs().models()['Crab']['RA'].free()
like.obs().models()['Crab']['DEC'].free()
like.run()
print(like.obs().models())
=== GModels ===
 Number of models ..........: 2
 Number of parameters ......: 10
=== GModelSky ===
 Name ......................: Crab
 Instruments ...............: all
 Observation identifiers ...: all
 Model type ................: PointSource
 Model components ..........: "PointSource" * "PowerLaw" * "Constant"
 Number of parameters ......: 6
 Number of spatial par's ...: 2
  RA .......................: 83.6321513539243 +/- 0.000642709906923956 [-360,360] deg (free,scale=1)
  DEC ......................: 22.015365038759 +/- 0.000605599773766818 [-90,90] deg (free,scale=1)
 Number of spectral par's ..: 3
  Prefactor ................: 5.71414829301837e-16 +/- 6.15917760942954e-18 [1e-23,1e-13] ph/cm2/s/MeV (free,scale=1e-16,gradient)
  Index ....................: -2.47211138483201 +/- 0.0079230679574118 [-0,-5]  (free,scale=-1,gradient)
  PivotEnergy ..............: 300000 [10000,1000000000] MeV (fixed,scale=1000000,gradient)
 Number of temporal par's ..: 1
  Normalization ............: 1 (relative value) (fixed,scale=1,gradient)
 Number of scale par's .....: 0
=== GCTAModelIrfBackground ===
 Name ......................: CTABackgroundModel
 Instruments ...............: CTA
 Observation identifiers ...: all
 Model type ................: "PowerLaw" * "Constant"
 Number of parameters ......: 4
 Number of spectral par's ..: 3
  Prefactor ................: 0.999543391635431 +/- 0.0072207997895289 [0.001,1000] ph/cm2/s/MeV (free,scale=1,gradient)
  Index ....................: -0.000148647508899144 +/- 0.0025512389825989 [-5,5]  (free,scale=1,gradient)
  PivotEnergy ..............: 1000000 [10000,1000000000] MeV (fixed,scale=1000000,gradient)
 Number of temporal par's ..: 1
  Normalization ............: 1 (relative value) (fixed,scale=1,gradient)

Now we can quantify the improvement of the model by comparing the new value of the optimized function with the previous one.

In [14]:
like2 = like.opt().value()
ts    = -2.0 * (like2 - like1)
print(ts)
4.21736054635

The test statistic (TS) is expected to be distributed as a \(\chi^2_n\) with a number of degrees of freedom \(n\) equal to the additional number of degrees of freedom of the (second) more complex model with respect to the (first) more parsimonious one, for our case two degrees of freedom. The chance probability that the likelihood improved that much because of pure statistical fluctuations is the integral from TS to infinity of the chi square with two degrees of freedom. In this case the improvement is negligible (i.e., the chance probability is very high), as expected since in the model the source is already at the true position.

Generating a log file

By default, tools and scripts run from Python will not generate a log file. The reason for this is that Python scripts are often used to build ctools analysis pipelines and workflows, and one generally does not want that such a script pollutes the workspace with log files. You can however instruct a ctool or cscript to generate a log file as follows:

In [15]:
like.logFileOpen()
like.run()

This creates a log file named ctlike.log in your local work space that you can visualise using any editor.