TOAST

Contents:

Introduction

TOAST is a software framework for processing timestream data collected by telescopes. Telescopes which collect data as timestreams rather than images give us a unique set of analysis challenges. Detector data usually contains noise which is correlated in time as well as sources of correlated signal from the instrument and the environment. Large pieces of data must often be analyzed simultaneously to extract an estimate of the sky signal. TOAST has evolved over several years, and the current codebase contains a C++ library of optimized math functions wrapped in a Python layer which handles everything else.

The TOAST framework contains:

  • Tools for distributing data among many processes
  • Tools for performing operations on the local pieces of the data
  • Generic operators for common processing tasks (filtering, pointing expansion, map-making)
  • Basic classes for performing I/O in a limited set of formats
  • Well defined interfaces for adding custom I/O classes and processing operators

The highest-level control of the workflow is done by the user, often by writing a small Python “pipeline” script (some examples are included). Such pipeline scripts make use of TOAST functions for distributing data and then call built-in or custom operators to process the timestream data.

Data Organization

The TOAST framework groups data into one or more “observations”. Each observation represents data from a group of detectors for some time span. Detectors in the same observation must have the same number of samples for the length of the observation. We currently also assume that the noise properties of the detectors are constant across this observation (i.e. the noise is stationary). A TOAST “dataset” is simply a collection of one or more observations.

Workflow

  • one example with both synchronous operators and chained:
  • custom data distribution
  • custom TOD class
  • atmosphere (sync, internal)
  • correlated noise (sync, external)
  • chained (internal)
    • poly filter (internal)
    • pointing expansion (external)
    • accumulate Npp^-1 (internal)

Example: Satellite

Example: Ground-Based

Example: Balloon

Support for Specific Experiments

If you are a member of one of these projects:

  • Planck
  • LiteBIRD
  • Simons Array
  • Simons Observatory
  • CMB “Stage 4”

Then there are additional software repositories you have access to that contain extra TOAST classes and scripts for processing data from your experiment.

Installation

TOAST is written in C++ and python3 and depends on several commonly available packages. It also has some optional functionality that is only enabled if additional external libraries are available.

Compiled Dependencies

TOAST compilation requires a C++11 compatible compiler as well as a compatible MPI C++ compiler wrapper. You must also have an FFT library and both FFTW and Intel’s MKL are supported by configure checks. Additionally a BLAS/LAPACK installation is required.

Several optional compiled dependencies will enable extra features in TOAST. If the Elemental library is found at configure time then internal atmosphere simulation code will be enabled in the build. If the MADAM destriping mapmaker is available at runtime, then the python code will support calling that library.

Python Dependencies

You should have a reasonably new (>= 3.4.0) version of python3. We also require several common scientific python packages:

  • numpy
  • scipy
  • matplotlib
  • pyephem
  • mpi4py (>= 2.0.0)
  • healpy

For mpi4py, ensure that this package is compatible with the MPI C++ compiler used during TOAST installation. When installing healpy, you might encounter difficulties if you are in a cross-compile situation. In that case, I recommend installing the repackaged healpix here.

There are obviously several ways to meet these python requirements.

Option #0

If you are using machines at NERSC, see Using at NERSC.

Option #1

If you are using a linux distribution which is fairly recent (e.g. the latest Ubuntu version), then you can install all the dependencies with the system package manager:

%> apt-get install fftw-dev python3-scipy \
   python3-matplotlib python3-ephem python3-healpy \
   python3-mpi4py

On OS X, you can also get the dependencies with macports. However, on some systems OpenMPI from macports is broken and MPICH should be installed as the dependency for the mpi4py package.

Option #2

If your OS is old, you could use a virtualenv to install updated versions of packages into an isolated location. This is also useful if you want to separate your packages from the system installed versions, or if you do not have root access to the machine. Make sure that you have python3 and the corresponding python3-virtualenv packages installed on your system. Also make sure that you have some kind of MPI (OpenMPI or MPICH) installed with your system package manager. Then:

  1. create a virtualenv and activate it.
  2. once inside the virtualenv, pip install the dependencies

Option #3

Use Anaconda. Download and install Miniconda or the full Anaconda distribution. Make sure to install the Python3 version. If you are starting from Miniconda, install the dependencies that are available through conda:

%> conda install -c conda-forge numpy scipy matplotlib mpi4py healpy pyephem

Using Configure

TOAST uses autotools to configure, build, and install both the compiled code and the python tools. If you are running from a git checkout (instead of a distribution tarball), then first do:

%> ./autogen.sh

Now run configure:

%> ./configure --prefix=/path/to/install

See the top-level “platforms” directory for other examples of running the configure script. Now build and install the tools:

%> make install

In order to use the installed tools, you must make sure that the installed location has been added to the search paths for your shell. For example, the “<prefix>/bin” directory should be in your PATH and the python install location “<prefix>/lib/pythonX.X/site-packages” should be in your PYTHONPATH.

Testing the Installation

After installation, you can run both the compiled and python unit tests. These tests will create an output directory in your current working directory:

%> python -c "import toast.tests; toast.tests.run()"

Pipelines

Before delving into the structure of the toast package, it is sometimes useful to look at (and use!) an example. One such program is the simple script below which simulates a fake satellite scanning strategy with a focalplane of detectors and then makes a map.

Simple Satellite Simulation

The current version of this tool simulates parameterized boresight pointing and then uses the given focalplane (loaded from a pickle file) to compute the detector pointing. Noise properties of each detector are used to simulate noise timestreams.

In order to create a focalplane file, you can do for example:

import pickle
import numpy as np

fake = {}
fake['quat'] = np.array([0.0, 0.0, 1.0, 0.0])
fake['fwhm'] = 30.0
fake['fknee'] = 0.05
fake['alpha'] = 1.0
fake['NET'] = 0.000060
fake['color'] = 'r'
fp = {}
fp['bore'] = fake

with open('fp_lb.pkl', 'wb') as p:
    pickle.dump(fp, p)

Note that until the older TOAST mapmaking tools are ported, this script requires the use of libmadam (the –madam option).

usage: toast_satellite_sim.py [-h] [--samplerate SAMPLERATE]
                              [--spinperiod SPINPERIOD]
                              [--spinangle SPINANGLE]
                              [--precperiod PRECPERIOD]
                              [--precangle PRECANGLE] [--hwprpm HWPRPM]
                              [--hwpstep HWPSTEP] [--hwpsteptime HWPSTEPTIME]
                              [--obs OBS] [--gap GAP] [--numobs NUMOBS]
                              [--obschunks OBSCHUNKS] [--outdir OUTDIR]
                              [--debug] [--nside NSIDE] [--baseline BASELINE]
                              [--noisefilter] [--madam] [--fp FP]

Simulate satellite boresight pointing and make a noise map.

optional arguments:
  -h, --help            show this help message and exit
  --samplerate SAMPLERATE
                        Detector sample rate (Hz)
  --spinperiod SPINPERIOD
                        The period (in minutes) of the rotation about the spin
                        axis
  --spinangle SPINANGLE
                        The opening angle (in degrees) of the boresight from
                        the spin axis
  --precperiod PRECPERIOD
                        The period (in minutes) of the rotation about the
                        precession axis
  --precangle PRECANGLE
                        The opening angle (in degrees) of the spin axis from
                        the precession axis
  --hwprpm HWPRPM       The rate (in RPM) of the HWP rotation
  --hwpstep HWPSTEP     For stepped HWP, the angle in degrees of each step
  --hwpsteptime HWPSTEPTIME
                        For stepped HWP, the the time in seconds between steps
  --obs OBS             Number of hours in one science observation
  --gap GAP             Cooler cycle time in hours between science obs
  --numobs NUMOBS       Number of complete observations
  --obschunks OBSCHUNKS
                        Number of chunks to subdivide each observation into
                        for data distribution
  --outdir OUTDIR       Output directory
  --debug               Write diagnostics
  --nside NSIDE         Healpix NSIDE
  --baseline BASELINE   Destriping baseline length (seconds)
  --noisefilter         Destripe with the noise filter enabled
  --madam               If specified, use libmadam for map-making
  --fp FP               Pickle file containing a dictionary of detector
                        properties. The keys of this dict are the detector
                        names, and each value is also a dictionary with keys
                        "quat" (4 element ndarray), "fwhm" (float, arcmin),
                        "fknee" (float, Hz), "alpha" (float), and "NET"
                        (float). For optional plotting, the key "color" can
                        specify a valid matplotlib color string.

Example: Proposed CoRE Satellite Boresight

Here is one example using this script to generate one day of scanning with a single boresight detector, and using one proposed scan strategy for a LiteCoRE satellite:

toast_satellite_sim.py --samplerate 175.86 --spinperiod 1.0 --spinangle 45.0
--precperiod 5760.0 --precangle 50.0 --hwprpm 0.0 --obs 23.0 --gap 1.0
--obschunks 24 --numobs 1 --nside 1024 --baseline 5.0 --madam --noisefilter
--fp fp_core.pkl --outdir out_core_nohwp_fast

Example: Proposed LiteBIRD Satellite Boresight

Here is how you could do a similar thing with a boresight detector and one proposed lightbird scanning strategy for a day:

toast_satellite_sim.py --samplerate 23.0 --spinperiod 10.0 --spinangle 30.0
--precperiod 93.0 --precangle 65.0 --hwprpm 88.0 --obs 23.0 --gap 1.0
--obschunks 24 --numobs 1 --nside 1024 --baseline 60.0 --madam --fp fp_lb.pkl
--debug --outdir out_lb_hwp

Creating Your Own Pipeline

TOAST is designed to give you tools to piece together your own data processing workflow. Here is a slightly modified version of the pipeline script above. This takes a boresight detector with 1/f noise properties, simulates a sort-of Planck scanning strategy, generates a noise timestream, and then generates a fake signal timestream and adds it to the noise. Then it uses madam to make a map.

Data Distribution

The toast package is designed for data that is distributed across many processes. When passing the data to toast processing routines, you can either use pre-defined base classes as a container and copy your data into them, or you can create your own derived classes that provide a standard interface.

In either case the full dataset is divided into one or more observations, and each observation has one TOD object (and optionally other objects that describe the noise, valid data intervals, etc). The toast “Comm” class has two levels of MPI communicators that can be used to divide many observations between whole groups of processes. In practice this is not always needed, and the default construction of the Comm object just results in one group with all processes.

The Data class below is essentially just a list of observations for each process group.

Example

Telescope TOD

The TOD base class represents the timestream information associated with a telescope. The base class enforces a minimal set of methods for reading and writing detector data and flags, detector pointing, and timestamps. The base class also provide methods for returning information about the data distribution, including which samples are local to a given process.

The base TOD class contains a member which is an instance of a Cache object. This is similar to a dictionary of arrays, but by default the memory used in these arrays is allocated in C, rather than using the python memory pool. This allows us to do aligned memory allocation and explicitly manage the lifetime of the memory.

Data Intervals

Within each TOD object, a process contains some local set of detectors and range of samples. That range of samples may contain one or more contiguous “chunks” that were used when distributing the data. Separate from this data distribution, TOAST has the concept of valid data “intervals”. This list of intervals applies to the whole TOD sample range, and all processes have a copy of this list.

Noise Model

The Noise base class represents the time domain noise covariance for all detectors for an entire TOD length.

Pointing Matrices

A “pointing matrix” in TOAST terms is the sparse matrix that describes how sky signal is projected to the timestream. In particular, the model we use is

\[d_t = \mathcal{A}_{tp} s_p + n_t\]

where we write \(s_p\) as a column vector having a number of rows given by the number of pixels in the sky. So the \(\mathcal{A}_{tp}\) matrix has a number of rows given by the number of time samples and a column for every sky pixel. In practice, the pointing matrix is sparse, and we only store the nonzero elements in each row. Also, our sky model often includes multiple terms (e.g. I, Q, and U). This is equivalent to having a set of values at each sky pixel. In TOAST we represent the pointing matrix as a vector of pixel indices (one for each sample) and a 2D array of “weights” whose values are the nonzero values of the matrix for each sample. PyTOAST includes a generic HEALPix operator to generate a pointing matrix.

Generic HEALPix Representation

Each experiment might create other specialized pointing matrices used in solving for instrument-specific signals.

Simulations

There are several classes included in pytoast that can simulate different types of data.

Simulated Telescope

Simulated Noise Model

Simulated Intervals

Simulated Detector Data

This operator uses an externally installed libconviqt.

Map-making Tools

This broad class of operations include anything that generates pixel-space data products.

Distributed Pixel-space Data

Diagonal Noise Covariance

Native Mapmaking

Using the distributed diagonal noise covariance tools, one can make a simple binned map. Porting the old TOAST map-maker to this version of TOAST is still on the to-do list.

External Madam Interface

If the MADAM library is installed and in your shared library search path, then you can use it to make maps.

Timing Results

Overview

There are essentially two components of the output:

  • a text file (the timing_report_XXX.out file)
    • general report
  • a JSON file with more detailed data
    • disabled by default due to the very long time for output for very large simulations
    • used for plotting purposes
    • example provided in examples/timing_plot.py
    • the command-line option to enable it is “—enable-timer-serialization” assuming you are using the existing pipelines or add in the argparse handler detailed later
  • Implementation uses “auto-timers”. Essentially, at the beginning of a function, you create a timer.
  • The timer starts automatically and when the timer is “destroyed”, i.e. goes out of scope at the end of the function, it stops the timer and records the time difference and also some memory measurements.
  • The way the auto-timers are setup is that they will automatically record the name of the function they were created in.
  • Additional info is sometimes added when you have similar function names, for example, a python “__init__” function will want to create an auto-timer that provides the class the function is being called from.
  • All this info will show up with an ensuing “@‘ tag on the end of the function name. Other options are the name of the file, etc.

Example

For the interpretation of text output, here is an example and the explanation of it’s structure

> rank 0
> [pyc] main@'toast_ground_sim_simple.py'             : 17.650 wall, 21.920 user +  1.610 system = 23.530 CPU [seconds] (133.3%) [ total rss curr|peak = 655.8|944.1 MB ] [ self rss curr|peak = 582.7|871.0 MB ]
> [pyc]   create_observations                         :  1.283 wall,  1.250 user +  0.030 system =  1.280 CPU [seconds] ( 99.7%) [ total rss curr|peak =  91.8|102.6 MB ] [ self rss curr|peak =  18.4| 29.2 MB ]
> [pyc]     __init__@TODGround                        :  1.280 wall,  1.250 user +  0.030 system =  1.280 CPU [seconds] (100.0%) [ total rss curr|peak =  93.2|102.6 MB ] [ self rss curr|peak =   4.7|  5.4 MB ] (total # of laps: 2)
> [pyc]       simulate_scan@TODGround                 :  0.035 wall,  0.040 user +  0.010 system =  0.050 CPU [seconds] (143.8%) [ total rss curr|peak =  88.5| 97.3 MB ] [ self rss curr|peak =   0.0|  0.0 MB ] (total # of laps: 2)
> [pyc]       translate_pointing@TODGround            :  1.239 wall,  1.210 user +  0.020 system =  1.230 CPU [seconds] ( 99.3%) [ total rss curr|peak = 102.6|102.6 MB ] [ self rss curr|peak =  14.0|  5.4 MB ] (total # of laps: 2)
> [pyc]         rotate                                :  0.004 wall,  0.010 user +  0.000 system =  0.010 CPU [seconds] (252.6%) [ total rss curr|peak =  89.6| 97.3 MB ] [ self rss curr|peak =   0.9|  0.0 MB ] (total # of laps: 2)
> [pyc]         radec2quat@TODGround                  :  0.014 wall,  0.010 user +  0.010 system =  0.020 CPU [seconds] (142.7%) [ total rss curr|peak =  97.1| 97.3 MB ] [ self rss curr|peak =   6.9|  0.0 MB ] (total # of laps: 2)
> [pyc]           rotation                            :  0.009 wall,  0.000 user +  0.010 system =  0.010 CPU [seconds] (110.4%) [ total rss curr|peak =  94.4| 97.3 MB ] [ self rss curr|peak =   0.9|  0.0 MB ] (total # of laps: 6)
> [cxx]             ctoast_qarray_from_axisangle      :  0.006 wall,  0.000 user +  0.010 system =  0.010 CPU [seconds] (160.5%) [ total rss curr|peak =  94.4| 97.3 MB ] [ self rss curr|peak =   0.9|  0.0 MB ] (total # of laps: 6)
> [pyc]   expand_pointing                             :  0.918 wall,  1.110 user +  0.150 system =  1.260 CPU [seconds] (137.3%) [ total rss curr|peak = 374.2|374.9 MB ] [ self rss curr|peak = 282.3|272.2 MB ]
> [pyc]     exec@OpPointingHpix                       :  0.917 wall,  1.110 user +  0.150 system =  1.260 CPU [seconds] (137.4%) [ total rss curr|peak = 374.9|374.9 MB ] [ self rss curr|peak = 283.0|272.2 MB ]
> [pyc]       read_times@TODGround                    :  0.006 wall,  0.000 user +  0.010 system =  0.010 CPU [seconds] (170.0%) [ total rss curr|peak = 234.5|234.5 MB ] [ self rss curr|peak =   0.0|  0.0 MB ] (total # of laps: 2)
> [pyc]         _get_times@TODGround                  :  0.006 wall,  0.000 user +  0.010 system =  0.010 CPU [seconds] (173.7%) [ total rss curr|peak = 234.5|234.5 MB ] [ self rss curr|peak =   0.0|  0.0 MB ] (total # of laps: 2)
> [pyc]       read_pntg@TODGround                     :  0.109 wall,  0.100 user +  0.010 system =  0.110 CPU [seconds] (101.2%) [ total rss curr|peak = 372.6|372.6 MB ] [ self rss curr|peak =   0.0|  0.0 MB ] (total # of laps: 122)
> [pyc]         _get_pntg@TODGround                   :  0.101 wall,  0.100 user +  0.000 system =  0.100 CPU [seconds] ( 99.0%) [ total rss curr|peak = 372.6|372.6 MB ] [ self rss curr|peak =   0.0|  0.0 MB ] (total # of laps: 122)
> [pyc]           mult                                :  0.094 wall,  0.090 user +  0.000 system =  0.090 CPU [seconds] ( 95.9%) [ total rss curr|peak = 372.6|372.6 MB ] [ self rss curr|peak =   0.0|  0.0 MB ] (total # of laps: 122)
> [cxx]             ctoast_qarray_mult                :  0.035 wall,  0.070 user +  0.000 system =  0.070 CPU [seconds] (200.3%) [ total rss curr|peak = 372.6|372.6 MB ] [ self rss curr|peak =   0.0|  0.0 MB ] (total # of laps: 122)
> [cxx]       ctoast_pointing_healpix_matrix          :  0.716 wall,  0.940 user +  0.120 system =  1.060 CPU [seconds] (148.1%) [ total rss curr|peak = 374.9|374.9 MB ] [ self rss curr|peak =   2.3|  2.3 MB ] (total # of laps: 122)
> [pyc]   get_submaps                                 :  0.092 wall,  0.090 user +  0.000 system =  0.090 CPU [seconds] ( 98.0%) [ total rss curr|peak = 374.4|374.9 MB ] [ self rss curr|peak =   0.2|  0.0 MB ]
> [pyc]     exec@OpLocalPixels                        :  0.091 wall,  0.090 user +  0.000 system =  0.090 CPU [seconds] ( 98.4%) [ total rss curr|peak = 374.4|374.9 MB ] [ self rss curr|peak =   0.2|  0.0 MB ]
> [pyc]   scan_signal                                 :  0.676 wall,  0.500 user +  0.110 system =  0.610 CPU [seconds] ( 90.3%) [ total rss curr|peak = 536.5|540.2 MB ] [ self rss curr|peak = 162.2|165.3 MB ]
> [pyc]     read_healpix_fits@DistPixels              :  0.422 wall,  0.270 user +  0.090 system =  0.360 CPU [seconds] ( 85.4%) [ total rss curr|peak = 486.5|521.2 MB ] [ self rss curr|peak = 112.2|146.4 MB ]
> [pyc]     exec@OpSimScan                            :  0.250 wall,  0.230 user +  0.020 system =  0.250 CPU [seconds] ( 99.8%) [ total rss curr|peak = 540.2|540.2 MB ] [ self rss curr|peak =  54.6| 18.9 MB ]
> [cxx]       ctoast_sim_map_scan_map32               :  0.029 wall,  0.020 user +  0.000 system =  0.020 CPU [seconds] ( 69.6%) [ total rss curr|peak = 540.2|540.2 MB ] [ self rss curr|peak =   0.0|  0.0 MB ] (total # of laps: 122)
> [pyc]   build_npp                                   :  3.136 wall,  2.910 user +  0.320 system =  3.230 CPU [seconds] (103.0%) [ total rss curr|peak = 777.8|944.1 MB ] [ self rss curr|peak = 243.2|403.9 MB ]
> [pyc]     exec@OpAccumDiag                          :  0.359 wall,  0.460 user +  0.020 system =  0.480 CPU [seconds] (133.7%) [ total rss curr|peak = 542.9|542.9 MB ] [ self rss curr|peak =   8.0|  2.7 MB ]
> [cxx]       ctoast_cov_accumulate_diagonal_invnpp   :  0.139 wall,  0.260 user +  0.020 system =  0.280 CPU [seconds] (201.6%) [ total rss curr|peak = 542.9|542.9 MB ] [ self rss curr|peak =   0.0|  0.0 MB ] (total # of laps: 122)
> [cxx]         accumulate_diagonal_invnpp            :  0.134 wall,  0.260 user +  0.020 system =  0.280 CPU [seconds] (208.3%) [ total rss curr|peak = 542.9|542.9 MB ] [ self rss curr|peak =   0.0|  0.0 MB ] (total # of laps: 122)
> [pyc]     write_healpix_fits@DistPixels             :  2.649 wall,  2.370 user +  0.250 system =  2.620 CPU [seconds] ( 98.9%) [ total rss curr|peak = 796.2|944.1 MB ] [ self rss curr|peak =   2.7|  0.0 MB ] (total # of laps: 3)
> [pyc]     covariance_invert@'map/noise.py'          :  0.011 wall,  0.010 user +  0.010 system =  0.020 CPU [seconds] (175.3%) [ total rss curr|peak = 780.1|944.1 MB ] [ self rss curr|peak =   0.0|  0.0 MB ]
> [cxx]       ctoast_cov_eigendecompose_diagonal      :  0.011 wall,  0.010 user +  0.010 system =  0.020 CPU [seconds] (182.7%) [ total rss curr|peak = 780.1|944.1 MB ] [ self rss curr|peak =   0.0|  0.0 MB ]
> [cxx]         eigendecompose_diagonal               :  0.011 wall,  0.010 user +  0.010 system =  0.020 CPU [seconds] (183.3%) [ total rss curr|peak = 780.1|944.1 MB ] [ self rss curr|peak =   0.0|  0.0 MB ]
> [pyc]   exec@OpCacheCopy                            :  0.042 wall,  0.030 user +  0.020 system =  0.050 CPU [seconds] (118.4%) [ total rss curr|peak = 831.6|944.1 MB ] [ self rss curr|peak =  56.2|  0.0 MB ]
> [pyc]   bin_maps                                    :  1.915 wall,  1.870 user +  0.180 system =  2.050 CPU [seconds] (107.1%) [ total rss curr|peak = 756.4|944.1 MB ] [ self rss curr|peak =   0.0|  0.0 MB ] (total # of laps: 2)
> [pyc]     exec@OpAccumDiag                          :  0.606 wall,  0.720 user +  0.030 system =  0.750 CPU [seconds] (123.8%) [ total rss curr|peak = 675.6|944.1 MB ] [ self rss curr|peak =   0.0|  0.0 MB ] (total # of laps: 2)
> [cxx]       ctoast_cov_accumulate_zmap              :  0.187 wall,  0.320 user +  0.010 system =  0.330 CPU [seconds] (176.2%) [ total rss curr|peak = 760.6|944.1 MB ] [ self rss curr|peak =   0.0|  0.0 MB ] (total # of laps: 244)
> [cxx]         accumulate_zmap                       :  0.183 wall,  0.320 user +  0.010 system =  0.330 CPU [seconds] (180.6%) [ total rss curr|peak = 760.6|944.1 MB ] [ self rss curr|peak =   0.0|  0.0 MB ] (total # of laps: 244)
> [pyc]     write_healpix_fits@DistPixels             :  1.240 wall,  1.100 user +  0.140 system =  1.240 CPU [seconds] (100.0%) [ total rss curr|peak = 758.7|944.1 MB ] [ self rss curr|peak = 174.2|  0.0 MB ] (total # of laps: 2)
> [pyc]   apply_polyfilter                            :  0.367 wall,  0.570 user +  0.060 system =  0.630 CPU [seconds] (171.5%) [ total rss curr|peak = 756.3|944.1 MB ] [ self rss curr|peak =   0.0|  0.0 MB ]
> [pyc]     exec@OpPolyFilter                         :  0.363 wall,  0.570 user +  0.050 system =  0.620 CPU [seconds] (170.9%) [ total rss curr|peak = 756.3|944.1 MB ] [ self rss curr|peak =   0.0|  0.0 MB ]
> [cxx]       ctoast_filter_polyfilter                :  0.277 wall,  0.470 user +  0.050 system =  0.520 CPU [seconds] (187.5%) [ total rss curr|peak = 756.3|944.1 MB ] [ self rss curr|peak =   0.0|  0.0 MB ] (total # of laps: 122)
> [pyc]   exec@OpCacheClear                           :  0.017 wall,  0.000 user +  0.020 system =  0.020 CPU [seconds] (119.6%) [ total rss curr|peak = 441.8|944.1 MB ] [ self rss curr|peak =   0.0|  0.0 MB ]
> [pyc]   apply_madam                                 :  9.183 wall, 13.570 user +  0.720 system = 14.290 CPU [seconds] (155.6%) [ total rss curr|peak = 655.8|944.1 MB ] [ self rss curr|peak = 214.0|  0.0 MB ]
> [pyc]     exec@OpMadam                              :  9.161 wall, 13.560 user +  0.710 system = 14.270 CPU [seconds] (155.8%) [ total rss curr|peak = 655.8|944.1 MB ] [ self rss curr|peak = 214.0|  0.0 MB ]
> [pyc]   __del__@TODGround                           :  1.709 wall,  1.650 user +  0.050 system =  1.700 CPU [seconds] ( 99.5%) [ total rss curr|peak = 234.4|944.1 MB ] [ self rss curr|peak =   0.0|  0.0 MB ] (total # of laps: 2)
> rank 1
> [pyc] main@'toast_ground_sim_simple.py'             : 17.650 wall, 21.610 user +  1.910 system = 23.520 CPU [seconds] (133.3%) [ total rss curr|peak = 647.6|788.1 MB ] [ self rss curr|peak = 574.4|714.8 MB ]
> [pyc]   load_schedule                               :  0.001 wall,  0.010 user +  0.000 system =  0.010 CPU [seconds] (697.4%) [ total rss curr|peak =  73.3| 73.3 MB ] [ self rss curr|peak =   0.1|  0.1 MB ]
> [pyc]   create_observations                         :  1.283 wall,  1.250 user +  0.030 system =  1.280 CPU [seconds] ( 99.8%) [ total rss curr|peak =  91.5|102.4 MB ] [ self rss curr|peak =  18.1| 29.0 MB ]
> [pyc]     __init__@TODGround                        :  1.280 wall,  1.250 user +  0.030 system =  1.280 CPU [seconds] (100.0%) [ total rss curr|peak =  93.0|102.4 MB ] [ self rss curr|peak =   5.0|  5.7 MB ] (total # of laps: 2)
> [pyc]       simulate_scan@TODGround                 :  0.036 wall,  0.030 user +  0.000 system =  0.030 CPU [seconds] ( 84.4%) [ total rss curr|peak =  88.0| 96.8 MB ] [ self rss curr|peak =   0.0|  0.0 MB ] (total # of laps: 2)
> [pyc]       translate_pointing@TODGround            :  1.239 wall,  1.220 user +  0.030 system =  1.250 CPU [seconds] (100.9%) [ total rss curr|peak = 102.4|102.4 MB ] [ self rss curr|peak =  14.3|  5.7 MB ] (total # of laps: 2)
> [pyc]         from_angles                           :  0.003 wall,  0.010 user +  0.000 system =  0.010 CPU [seconds] (289.8%) [ total rss curr|peak =  88.2| 96.8 MB ] [ self rss curr|peak =   0.0|  0.0 MB ] (total # of laps: 2)
> [cxx]           ctoast_qarray_from_angles           :  0.003 wall,  0.010 user +  0.000 system =  0.010 CPU [seconds] (316.4%) [ total rss curr|peak =  88.2| 96.8 MB ] [ self rss curr|peak =   0.0|  0.0 MB ] (total # of laps: 2)
> [pyc]         radec2quat@TODGround                  :  0.014 wall,  0.010 user +  0.010 system =  0.020 CPU [seconds] (146.2%) [ total rss curr|peak =  96.9| 96.9 MB ] [ self rss curr|peak =   6.9|  0.2 MB ] (total # of laps: 2)
> [pyc]           rotation                            :  0.009 wall,  0.010 user +  0.010 system =  0.020 CPU [seconds] (222.1%) [ total rss curr|peak =  94.2| 96.8 MB ] [ self rss curr|peak =   0.9|  0.0 MB ] (total # of laps: 6)
> [cxx]             ctoast_qarray_from_axisangle      :  0.006 wall,  0.010 user +  0.000 system =  0.010 CPU [seconds] (164.5%) [ total rss curr|peak =  94.2| 96.8 MB ] [ self rss curr|peak =   0.9|  0.0 MB ] (total # of laps: 6)
> [pyc]   expand_pointing                             :  0.918 wall,  1.110 user +  0.150 system =  1.260 CPU [seconds] (137.3%) [ total rss curr|peak = 373.8|374.5 MB ] [ self rss curr|peak = 282.3|272.1 MB ]
> [pyc]     exec@OpPointingHpix                       :  0.916 wall,  1.100 user +  0.140 system =  1.240 CPU [seconds] (135.4%) [ total rss curr|peak = 374.5|374.5 MB ] [ self rss curr|peak = 283.0|272.1 MB ]
> [pyc]       read_pntg@TODGround                     :  0.108 wall,  0.090 user +  0.020 system =  0.110 CPU [seconds] (101.5%) [ total rss curr|peak = 372.2|372.2 MB ] [ self rss curr|peak =   0.0|  0.0 MB ] (total # of laps: 122)
> [pyc]         _get_pntg@TODGround                   :  0.101 wall,  0.080 user +  0.020 system =  0.100 CPU [seconds] ( 99.3%) [ total rss curr|peak = 372.2|372.2 MB ] [ self rss curr|peak =   0.0|  0.0 MB ] (total # of laps: 122)
> [pyc]           mult                                :  0.094 wall,  0.080 user +  0.020 system =  0.100 CPU [seconds] (106.6%) [ total rss curr|peak = 372.2|372.2 MB ] [ self rss curr|peak =   0.0|  0.0 MB ] (total # of laps: 122)
> [cxx]             ctoast_qarray_mult                :  0.035 wall,  0.030 user +  0.020 system =  0.050 CPU [seconds] (142.6%) [ total rss curr|peak = 372.2|372.2 MB ] [ self rss curr|peak =   0.0|  0.0 MB ] (total # of laps: 122)
> [cxx]       ctoast_pointing_healpix_matrix          :  0.718 wall,  0.930 user +  0.120 system =  1.050 CPU [seconds] (146.3%) [ total rss curr|peak = 374.5|374.5 MB ] [ self rss curr|peak =   2.3|  2.3 MB ] (total # of laps: 122)
> [pyc]   get_submaps                                 :  0.092 wall,  0.080 user +  0.000 system =  0.080 CPU [seconds] ( 87.2%) [ total rss curr|peak = 374.0|374.5 MB ] [ self rss curr|peak =   0.2|  0.0 MB ]
> [pyc]     exec@OpLocalPixels                        :  0.090 wall,  0.080 user +  0.000 system =  0.080 CPU [seconds] ( 88.6%) [ total rss curr|peak = 374.0|374.5 MB ] [ self rss curr|peak =   0.2|  0.0 MB ]
> [pyc]   scan_signal                                 :  0.672 wall,  0.550 user +  0.100 system =  0.650 CPU [seconds] ( 96.8%) [ total rss curr|peak = 435.5|435.5 MB ] [ self rss curr|peak =  61.5| 61.0 MB ]
> [pyc]     read_healpix_fits@DistPixels              :  0.422 wall,  0.320 user +  0.080 system =  0.400 CPU [seconds] ( 94.9%) [ total rss curr|peak = 379.0|379.0 MB ] [ self rss curr|peak =   5.0|  4.5 MB ]
> [pyc]     exec@OpSimScan                            :  0.249 wall,  0.230 user +  0.020 system =  0.250 CPU [seconds] (100.2%) [ total rss curr|peak = 435.5|435.5 MB ] [ self rss curr|peak =  56.5| 56.5 MB ]
> [cxx]       ctoast_sim_map_scan_map32               :  0.028 wall,  0.020 user +  0.000 system =  0.020 CPU [seconds] ( 70.4%) [ total rss curr|peak = 435.5|435.5 MB ] [ self rss curr|peak =   0.0|  0.0 MB ] (total # of laps: 122)
> [pyc]   build_npp                                   :  3.141 wall,  2.590 user +  0.610 system =  3.200 CPU [seconds] (101.9%) [ total rss curr|peak = 466.6|467.8 MB ] [ self rss curr|peak =  31.3| 32.2 MB ]
> [pyc]     exec@OpAccumDiag                          :  0.359 wall,  0.460 user +  0.020 system =  0.480 CPU [seconds] (133.7%) [ total rss curr|peak = 442.6|442.6 MB ] [ self rss curr|peak =   6.8|  6.8 MB ]
> [cxx]       ctoast_cov_accumulate_diagonal_invnpp   :  0.139 wall,  0.260 user +  0.010 system =  0.270 CPU [seconds] (194.2%) [ total rss curr|peak = 442.6|442.6 MB ] [ self rss curr|peak =   0.0|  0.0 MB ] (total # of laps: 122)
> [cxx]         accumulate_diagonal_invnpp            :  0.134 wall,  0.260 user +  0.010 system =  0.270 CPU [seconds] (200.8%) [ total rss curr|peak = 442.6|442.6 MB ] [ self rss curr|peak =   0.0|  0.0 MB ] (total # of laps: 122)
> [pyc]     write_healpix_fits@DistPixels             :  0.105 wall,  0.080 user +  0.020 system =  0.100 CPU [seconds] ( 94.8%) [ total rss curr|peak = 466.6|467.8 MB ] [ self rss curr|peak =   0.0|  0.0 MB ] (total # of laps: 3)
> [pyc]     covariance_invert@'map/noise.py'          :  0.013 wall,  0.010 user +  0.000 system =  0.010 CPU [seconds] ( 77.9%) [ total rss curr|peak = 466.6|467.8 MB ] [ self rss curr|peak =   0.0|  0.0 MB ]
> [cxx]       ctoast_cov_eigendecompose_diagonal      :  0.012 wall,  0.010 user +  0.000 system =  0.010 CPU [seconds] ( 80.9%) [ total rss curr|peak = 466.6|467.8 MB ] [ self rss curr|peak =   0.0|  0.0 MB ]
> [cxx]         eigendecompose_diagonal               :  0.012 wall,  0.010 user +  0.000 system =  0.010 CPU [seconds] ( 81.1%) [ total rss curr|peak = 466.6|467.8 MB ] [ self rss curr|peak =   0.0|  0.0 MB ]
> [pyc]   exec@OpCacheCopy                            :  0.041 wall,  0.020 user +  0.010 system =  0.030 CPU [seconds] ( 72.8%) [ total rss curr|peak = 517.7|517.7 MB ] [ self rss curr|peak =  52.9| 50.0 MB ]
> [pyc]   bin_maps                                    :  1.917 wall,  1.760 user +  0.300 system =  2.060 CPU [seconds] (107.5%) [ total rss curr|peak = 516.8|521.7 MB ] [ self rss curr|peak =   0.0|  0.0 MB ] (total # of laps: 2)
> [pyc]     exec@OpAccumDiag                          :  0.591 wall,  0.740 user +  0.020 system =  0.760 CPU [seconds] (128.6%) [ total rss curr|peak = 521.7|521.7 MB ] [ self rss curr|peak =   0.0|  0.0 MB ] (total # of laps: 2)
> [cxx]       ctoast_cov_accumulate_zmap              :  0.185 wall,  0.360 user +  0.000 system =  0.360 CPU [seconds] (194.7%) [ total rss curr|peak = 521.7|521.7 MB ] [ self rss curr|peak =   0.0|  0.0 MB ] (total # of laps: 244)
> [cxx]         accumulate_zmap                       :  0.180 wall,  0.360 user +  0.000 system =  0.360 CPU [seconds] (199.6%) [ total rss curr|peak = 521.7|521.7 MB ] [ self rss curr|peak =   0.0|  0.0 MB ] (total # of laps: 244)
> [pyc]     write_healpix_fits@DistPixels             :  0.054 wall,  0.050 user +  0.020 system =  0.070 CPU [seconds] (130.3%) [ total rss curr|peak = 516.8|521.7 MB ] [ self rss curr|peak =   0.0|  0.0 MB ] (total # of laps: 2)
> [pyc]   apply_polyfilter                            :  0.367 wall,  0.560 user +  0.060 system =  0.620 CPU [seconds] (168.7%) [ total rss curr|peak = 512.4|521.7 MB ] [ self rss curr|peak =   0.0|  0.0 MB ]
> [pyc]     exec@OpPolyFilter                         :  0.367 wall,  0.560 user +  0.060 system =  0.620 CPU [seconds] (168.8%) [ total rss curr|peak = 512.5|521.7 MB ] [ self rss curr|peak =   0.0|  0.0 MB ]
> [cxx]       ctoast_filter_polyfilter                :  0.280 wall,  0.470 user +  0.060 system =  0.530 CPU [seconds] (189.4%) [ total rss curr|peak = 512.5|521.7 MB ] [ self rss curr|peak =   0.0|  0.0 MB ] (total # of laps: 122)
> [pyc]   apply_madam                                 :  9.193 wall, 13.670 user +  0.650 system = 14.320 CPU [seconds] (155.8%) [ total rss curr|peak = 647.6|788.1 MB ] [ self rss curr|peak = 214.5|266.4 MB ]
> [pyc]     exec@OpMadam                              :  9.193 wall, 13.670 user +  0.650 system = 14.320 CPU [seconds] (155.8%) [ total rss curr|peak = 647.6|788.1 MB ] [ self rss curr|peak = 214.5|266.4 MB ]
> [pyc]   __del__@TODGround                           :  1.582 wall,  1.510 user +  0.060 system =  1.570 CPU [seconds] ( 99.2%) [ total rss curr|peak = 226.2|788.1 MB ] [ self rss curr|peak =   0.0|  0.0 MB ] (total # of laps: 2)

GENERAL LAYOUT

  • The “rank” line(s) give the MPI process/rank
  • The first (non “>”) column tells whether the “auto-timer” originated from C++ ([cxx]) or Python ([pyc]) code
  • The second column is the function name the auto-timer was created in
    • The indentation signifies the call tree
  • The last column referring to “laps” is the number of times the function was invoked
    • If the number of laps are not noted, the total number of laps is implicitly one

TIMING FIELDS

  • Then you have 5 time measurements
    1. Wall clock time (e.g. how long it took according to a clock “on the wall”)
    2. User time (the time spent executing the code)
    3. System time (thread-specific CPU time, e.g. an idle thread waiting for synchronization, etc.)
    4. CPU time (user + system time)
    5. Percent CPU utilization (cpu / wall * 100)
  • For perfect speedup on 4 threads, the CPU time would be 4x as long as the wall clock time and would have a % CPU utilization of 400%
    • This also includes vectorization. If each thread ran a calculation that calculated 4 values with a single CPU instruction (SIMD), we would have a speed up of 16x (4 threads x 4 values at one time == 16x)
  • Relative time (i.e. self-cost) for a function at a certain indent level (i.e. indented with 2*level spaces from [pyc]/[cxx]) can be calculated from the function(s) at level+1 until you reach another function at the same level
  • This is better understood by an example
    • function A is the main (it is “level 0”) and takes 35 seconds
    • function B is called from main (it is “level 1”)
    • function C is called from main (it is “level 1”)
    • function B does some calculations and calls function D (it is “level 2”) five times (e.g. a loop calling function D)
    • function B takes 20 seconds
    • function D, called from B, takes a total of 10 seconds (which is what is reported). The average time of function D is thus 2 seconds (10 sec / 5 laps)
    • function C does some calculations and also calls function D (again “level 2”) five times
    • The call to function D from function C will be reported as separate from the calls to D from B thanks to a hashing technique we use to identify function calls originating from different call trees/sequences
    • function C takes 9 seconds
    • function D, called from C, takes a total of 8 seconds (avg. of 1.6 seconds)
    • Thus we know that function B required 10 seconds of compute time by subtracting out the time spent in its calls to function D
    • We know that function C required 1 second of compute time by subtracting out the time spent in it’s calls to function D
    • We can subtract the time from function B and C to calculate the “self-cost” in function A (35 - 20 - 9 = 6 seconds)
      • When calculating the self-cost of A, one does not subtract the time spent in function D. These times are included in the timing of both B and C

MEMORY FIELDS

  • The memory measurements are a bit confusing, admittedly. The two types “curr” (“current”, which I will refer to as such from here on out) and “peak” have to do with different memory measurements
    • They are both “RSS” measurements, which stand for “resident set size”. This is the amount of physical memory in RAM that is currently private to the process
      • It does not include the “swap” memory, which is when the OS puts memory not currently being used onto the hard drive
      • Typical Linux implementations will start using swap when ~60% of your RAM is full (you can override this easily in Linux by switching the “swapiness” to say, 90% for better performance since swap is slower than RAM)
  • All memory measurements with “laps” > 0, are the max memory measurement of each “lap”
    • The “current” and “peak” max measurements are computed independently
    • E.g. the “current” max doesn’t directly correspond to the “peak” max — one “lap” may record the largest “current” RSS measurement but that does not (necessarily) mean that the same “lap” is responsible for the max “peak” RSS measurement
    • This is due to our belief that the max values are the ones of interest — the instances we must guard against to avoid running out of memory
  • With respect to “total” vs. “self”, this is fairly straightforward
    • For the “total”, I simply take a measurement of the memory usage at the creation of the timer
    • The “self” measurement is the difference in the memory measurements between the creation of the auto-timer and when it is destroyed
      • This measurement shows is how much persistent memory was created in the function
      • It is valuable primarily as a metric to see how much memory is being created in the function and returned to the calling function
      • For example, if function X called function Y and function Y allocated 10 MB of memory and returned an object using this memory to function X, you would see function Y have a “self-cost” of 10 MB in memory
  • The difference between “current” and “peak” is how the memory is measured
    • The “peak” value is what the OS reports as the max amount of memory being used is
    • I find this to be slightly more informative than “current” which is measurement of the “pages” allocated in memory
    • The reason “current” is included is because of the following:
      • Essentially, a “page” of memory can be thought of as street addresses separated into “blocks”, i.e. 1242 MLK Blvd. is in the 1200 block of MLK Blvd.
      • A “page” is thus similar to a “block” — it is a starting memory address
      • The size of the pages is defined by the OS and just like the “swappiness”, it can be modified
      • For example, the default page size may be 1 KB and when a process has memory allocation need for 5.5 KB, the OS will provide 6 “pages”
        • This is why one will see performance improvements when dealing with certain applications that application require large contiguous memory blocks, larger “pages” require fewer page requests and fewer reallocations to different pages when more memory is requested for an existing object with contiguous memory)
      • Within the page itself, the entire page might be used or it might not be fully used
      • When a page is not entirely used, you will get a “current” RSS usage greater than the “peak” memory usage — the memory is reserved for the process but is not actually used so it is thus not contained in the “peak” RSS usage number
      • However, when several pages is requested and allocated within a function but then released when returning to the calling function (i.e. temporary/transient page usage), you will have a “peak” RSS exceeding the “current” RSS memory usage since the “current” is measured after the pages are released back to the OS
      • Thus, with these two numbers, one can then deduce how much temporary/transient memory usage is being allocated in the function — if a function reports a self-cost of 243.2 MB of “current” RSS and a “peak” RSS of 403.9 MB, then you know that the “build_npp” function created 243.2 MB of persistent memory but creating the object requiring the persistent 243.2 MB required an additional 160.7 MB of temporary/transient memory (403.9 MB - 243.2 MB).

USING AUTO-TIMERS

If you have new Python code you would like to use the auto-timers with, here is general guide:

  • Import the timing module (obvious, I guess)
  • Always add the auto-timer at the very beginning of the function.
    • You can use an variable name you wish but make sure it is a named variable (e.g. “autotimer = timing.auto_timer()”, not “timing.auto_timer()”)
    • The auto-timer functionality requires the variable to exist for the scope of the function
  • For free-standing function without any name conflicts, just add: “autotimer = timing.auto_timer()”
  • For functions within a class, add: “autotimer = timing.auto_timer(type(self).__name__)”
  • For the primary auto-timer, use: “autotimer = timing.auto_timer(timing.FILE())” — this will tag “main” with the python file name
  • In some instances, you may want to include the directory of the filename, for this use: “autotimer = timing.auto_timer(timing.FILE(use_dirname = True))”
  • Add “tman = timing.timing_manager() ; tman.report()” at the end of your main file.
    • It is generally recommended to do this in a different scope than the primary autotimer but not necessary.
    • In other words, put all your work in a “main()” function looking like this:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
#!/usr/bin/env python

import toast
import toast.timing as timing

def main():

    # This is the 2-level toast communicator.  By default,
    # there is just one group which spans MPI_COMM_WORLD.
    comm = toast.Comm()

    # Create an argparse and add custom arguments
    parser = argparse.ArgumentParser(description=“...")
    parser.add_argument('--groupsize',
                        required=False, type=np.int,
                        help='Size of a process group assigned to a CES')

    # pass the argparse object to timing module which will add timing
    # arguments and return "parse.parse_args() result after handling
    # the timing specific options
    args = timing.add_arguments_and_parse(parser, timing.FILE(noquotes=True))
    # create the primary auto timer for the entire script
    autotimer = timing.auto_timer(timing.FILE())

    # do the work...
    # etc...

if __name__ == '__main__':
    try:
        main()
        tman = timing.timing_manager()
        tman.report()
    except Exception as e:
        # etc...
  • You generally don’t want to add any auto-timers until after the call to “timing.add_arguments_and_parse(…)” has been called
  • This routine implements several argparse options for the timing suite
    • Disable auto-timers completely (“–disable-timers”)
    • Output filenames (“–toast-timing-fname <STR>”)
    • Output directory (“–toast-output-directory <STR>”)
    • The max depth of the auto-timers (“–max-timer-depth <N>”)
      • e.g. if you only want the call tree to report timings 3 levels deep
    • Whether to output timer serialization to JSON file for plotting (“–enable-timer-serialization”)

Using at NERSC

To use TOAST at NERSC, you need to have a Python3 software stack with all dependencies installed. There is already such a software stack installed on edison and cori.

Module Files

To get access to the needed module files, add the machine-specific module file location to your search path:

%> module use /global/common/software/cmb/${NERSC_HOST}/modulefiles

You can safely put the above line in your ~/.bashrc.ext inside the sections for edison and cori.

Load Dependencies

In order to load a full python-3.6 stack, and also all dependencies needed by toast, do:

%> module load toast-deps

Install TOAST

The TOAST codebase is evolving daily, therefore we do not maintain a toast module. You have to install TOAST yourself from source.

When installing any software at NERSC, we need to keep several things in mind:

  • The home directories are small.
  • The performance of the home directories when accessed by many processes is very bad.
  • The scratch space has lots of room, and very good performance.
  • Any file on the scratch space that has not be accessed for some number of weeks will be deleted.

So unfortunately there is no location which has good performance and also persistent file storage. For this example, we will install software to scratch and assume that we will be using the software frequently enough that it will never be purged. If you have not used the tools for a month or so, you should probably reinstall just to be sure that everything is in place.

First, we pick a location to install our software. For this example, we will be installing to a “software” directory in our scratch space. First make sure that exists:

%> mkdir -p ${SCRATCH}/software

Now we will create a small shell function that loads this location into our search paths for executables and python packages. Add this function to ~/.bashrc.ext and you can rename it to whatever you like:

loadtoast () {
    export PREFIX=${SCRATCH}/software/toast
    export PATH=$PREFIX/bin:${PATH}
    export PYTHONPATH=$PREFIX/lib/python3.6/site-packages:${PYTHONPATH}
}

Log out and back in to make this function visible to your shell environment. Now checkout the toast source in your home directory somewhere:

%> cd
%> git clone https://github.com/hpc4cmb/toast

Then configure and build the software. Unless you know what you are doing, you should probably use the platform config example for the machine you are building for, consider that the toast-deps environment requires Intel compilers:

%> cd toast
%> ./autogen.sh
%> ./platforms/edison-intel.sh --prefix=${SCRATCH}/software/toast

Now we can run our function to load this installation into our environment:

%> loadtoast

On NERSC systems, MPI is not allowed to be run on the login nodes. In order to run our unittests, we first get an interactive compute node:

%> salloc

and then run the tests:

%> srun python -c "import toast.tests; toast.tests.run()"

You should read through the many good NERSC webpages that describe how to use the different machines. There are pages for edison and pages for cori.

Install Experiment Packages

If you are a member of Planck, Core, or LiteBIRD, you can get access to separate git repos with experiment-specific scripts and tools. You can install these to the same location as toast. All of those packages currently use distutils, and you will need to do the installation from a compute node (since importing the toast python module will load MPI):

%> cd toast-<experiment>
%> salloc
%> srun python setup.py clean
%> srun python setup.py install --prefix=${SCRATCH}/software/toast

Developer’s Guide

TOAST aims to follow best practices whenever reasonably possible. If you submit a pull request to contribute C++ code, try to match the existing coding style (indents are 4 spaces, not tabs, curly brace placement, spacing, etc). If you are contributing python code follow PEP-8. When documenting python classes and methods, we use google-style docstrings. The C++ code in TOAST uses the google test framework for unit tests. Python code uses the standard built in unittest classes. When contributing new code, please add unit tests as well. Even if we don’t have perfect test coverage, that should be our goal. When actively developing the codebase, you can run the C++ unit tests without installation by doing:

%> make check

In order to run the python unit tests, you must first do a “make install”.

Indices and tables