Using Pippin
Using Pippin is very simple. In the top level directory, there is a pippin.sh
script. If you’re on midway and use SNANA, this script will be on your path already. Otherwise you can add it to your path by adding the following to your .bashrc
:
export PATH=$PATH:"path/to/pippin"
To use Pippin, all you need is a config file ready to go. I’ve got a bunch of mine and some general ones in the configs directory, but you can put yours wherever you want. I recommend adding your initials to the front of the file to make it obvious in the shared output directory which folders as yours.
If you have example.yml
as your config file and want pippin to run it, simply run pippin.sh example.yml
.
The file name that you pass in should contain a run configuration. Note that this is different to the global software configuration file cfg.yml
, and remember to ensure that your cfg.yml
file is set up properly and that you know where you want your output to be installed. By default, I assume that the $PIPPIN_OUTPUT
environment variable is set as the output location, so please either set said variable or change the associated line in the cfg.yml. For the morbidly curious, here is a very small demo video of using Pippin in the Midway environment.

Creating your own configuration file
Each configuration file is represented by a yaml dictionary linking each stage (see stage declaration section below) to a dictionary of tasks, the key being the unique name for the task and the value being its specific task configuration.
For example, to define a configuration with two simulations and one light curve fitting task (resulting in 2 output simulations and 2 output light curve tasks - one for each simulation), a user would define:
SIM:
SIM_NAME_1:
SIM_CONFIG: HERE
SIM_NAME_2:
SIM_CONFIG: HERE
LCFIT:
LCFIT_NAME_1:
LCFIT_CONFIG: HERE
The available tasks and their configuration details can be found in the Tasks section. Alternatively, you can see examples in the examples
directory for each task.
Command Line Arguments
Pippin has a number of useful command line arguments which you can quickly reference via pippin.sh -h
.
-h, --help show this help message and exit
--config CONFIG Location of global config (i.e. cfg.yml)
-v, --verbose increase output verbosity
-s START, --start START
Stage to start and force refresh. Accepts either the
stage number or name (i.e. 1 or SIM)
-f FINISH, --finish FINISH
Stage to finish at (it runs this stage too). Accepts
either the stage number or name (i.e. 1 or SIM)
-r, --refresh Refresh all tasks, do not use hash
-c, --check Check if config is valid
-p, --permission Fix permissions and groups on all output, don't rerun
-i IGNORE, --ignore IGNORE
Dont rerun tasks with this stage or less. Accepts
either the stage number of name (i.e. 1 or SIM)
-S [SYNTAX], --syntax [SYNTAX]
Get the syntax of the given stage. Accepts either the
stage number or name (i.e. 1 or SIM). If run without
argument, will tell you all stage numbers / names.
-C, --compress Compress pippin output during job. Combine with -c /
--check in order to compress completed pippin job.
-U, --uncompress Do not compress pippin output during job. Combine
with -c / --check in order to uncompress completed
pippin job. Mutually exclusive with -C / --compress
As an example, to have a verbose output configuration run and only do data preperation and simulation, you would run pippin.sh -vf 1 configfile.yml
.
Pippin on Midway
On midway, sourcing the SNANA setup will add environment variables and Pippin to your path.
Pippin itself can be found at $PIPPIN
, output at $PIPPIN_OUTPUT
(which goes to a scratch directory), and pippin.sh
will automatically work from any location.
Note that you only have 100 GB on scratch. If you fill that up and need to nuke some files, look both in $SCRATCH_SIMDIR
to remove SNANA photometry and $PIPPIN_OUTPUT
to remove Pippin’s output. Running the dirusage
command on midway will (after some time) give you a list of which directories are taking up the most space.
Examples
If you want detailed examples of what you can do with Pippin tasks, have a look in the examples directory, pick the task you want to know more about, and have a look over all the options.
Here is a very simple configuration file which runs a simulation, does light curve fitting, and then classifies it useing the debug FITPROB classifier.
SIM:
DESSIM:
IA_G10_DES3YR:
BASE: surveys/des/sim_ia/sn_ia_salt2_g10_des3yr.input
LCFIT:
BASEDES:
BASE: surveys/des/lcfit_nml/des_5yr.nml
CLASSIFICATION:
FITPROBTEST:
CLASSIFIER: FitProbClassifier
MODE: predict
You can see that unless you specify a MASK
on each subsequent task, Pippin will generally try and run everything on everything. So if you have two simulations defined, you don’t need two light curve fitting tasks, Pippin will make one light curve fit task for each simulation, and then two classification tasks, one for each light curve fit task.
Best Practice
Here are a few best practices for improving your chance of success with Pippin.
Use screen
Pippin jobs can take a long time, so to avoid having to keep a terminal open and an ssh session active for the length of the entire run, it is highly recommended you run Pippin in a screen
session.
For example, if you are doing machine-learning testing, you may create a new screen session called ml by running screen -S ml
. It will then launch a new instance of bash for you to play around in. conda will not work out of the box. To make it work again, run conda deactivate
and then conda activate
, and you can check this works by running which python
and verifying its pointing to the miniconda install. You can then run Pippin as per normal: pippin.sh -v your_job.yml
and get the coloured output. To leave the screen session, but still keep Pippin running even after you log out, press Ctrl-A
, Ctrl-D
. As in one, and then the other, not Ctrl-A-D
. This will detach from your screen session but keep it running. Just going Ctrl_D
will disconnect and shut it down. To get back into your screen session, simply run screen -r ml
to reattach. You can see your screen sessions using screen -ls
.
You may notice if you log in and out of midway that your screen sessions might not show up. This is because midway has multiple head nodes, and your screen session exists only on one of them. This is why when I ssh to midway I specify a specific login node instead of being assigned one. To make it simpler, I’d recommend setting your ssh host in your .ssh/config
to something along the lines of:
Host midway2
HostName midway2-login1.rcc.uchicago.edu
User username
Make the most of command line options
There are a number of command line options that are particularly useful. Foremost amongst them is -v, --verbose
which shows debug output when running Pippin. Including this flag in your run makes it significantly easier to diagnose if anything goes wrong.
The next time saving flag is -c, --check
, which will do an initial passthrough of your input yaml file, pointing out any obvious errors before anything runs. This is particularly useful if you have long jobs and want to catch bugs early.
The final set of useful flags are the -s, --start
, -f, --finish
, and -i, --ignore
. These allow you to customize exactly what parts of your full job Pippin runs. Pippin decides whether or not it should rerun a task based on a hash generated each time it’s run. This hash produced based on the input, these flags are particularly useful if you change your input but don’t want stages to rerun, such as if you are making small changes to a final stage, or debugging an early stage.
Advanced Usage
The following are a number of advanced features which aren’t required to use Pippin but can drastically improve your experience with Pippin.
Yaml Anchors
If you are finding that your config files contain lots of duplicated sections (for example, many simulations configured almost the same way, but with one difference), consider using yaml anchors. A thorough explanation of how to use them is available here, however the basics are as follows. First you should add a new taml section at the tope of your input file. The name of this section doesn’t matter as long as it doesn’t clash with other Pippin stages, however I usually use ALIAS. Within this section, you include all of the yaml anchors you need. An example is shown below:
ALIAS:
LOWZSIM_IA: &LOWZSIM_IA
BASE: surveys/lowz/sims_ia/sn_ia_salt2_g10_lowz.input
SIM:
SIM_1:
IA_G10_LOWZ:
<<: *LOWZSIM_IA
# Other options here
SIM_2:
IA_G10_LOWZ:
<<: *LOWZSIM_IA
# Different options here
Include external aliases
This is new and experimental, use with caution.
Note that this is not yaml compliant.
When dealing with especially large jobs, or suites of jobs you might find yourself having very large ALIAS
/ANCHOR
blocks which are repated amongst a number of Pippin jobs. A cleaner alternative is to have a number of .yml
files containing your anchors, and then including
these in your input files which will run Pippin jobs. This way you can share anchors amongst multiple Pippin input files and update them all at the same time. In order to achieve this, Pippin can preprocess the input file to directly copy the anchor file into the job file. An example is provided below:
base_job_file.yml
# Values surround by % indicate preprocessing steps.
# The preprocess below will copy the provided yml files into this one before this one is read in, allowing anchors to propegate into this file
# They will be copied in, in the order you specify, with duplicate tasks merging.
# Note that whitespace before or after the % is fine, as long as % is the first and last character.
# % include: path/to/anchors_sim.yml %
# %include: path/to/anchors_lcfit.yml%
SIM:
DESSIM:
IA_G10_DES3YR:
BASE: surveys/des/sims_ia/sn_ia_salt2_g10_des3yr.input
GLOBAL:
# Note that this anchor doesn't exist in this file
<<: *SIM_GLOBAL
LCSIM:
IA_G10_LOWZ:
BASE: surveys/lowz/sims_ia/sn_ia_salt2_g10_lowz.input
GLOBAL:
# Note that this anchor doesn't exist in this file
<<: *SIM_GLOBAL
LCFIT:
LS:
BASE: surveys/lowz/lcfit_nml/lowz.nml
MASK: DATALOWZ
FITOPTS: surveys/lowz/lcfit_fitopts/lowz.yml
# Note that this anchor doesn't exist in this file
<<: *LCFIT_OPTS
DS:
BASE: surveys/des/lcfit_nml/des_3yr.nml
MASK: DATADES
FITOPTS: surveys/des/lcfit_fitopts/des.yml
# Note that this anchor doesn't exist in this file
<<: *LCFIT_OPTS
anchors_sim.yml
ANCHORS_SIM:
SIM_GLOBAL: &SIM_GLOBAL
W0_LAMBDA: -1.0
OMEGA_MATTER: 0.3
NGEN_UNIT: 0.1
anchors_lcfit.yml
ANCHORS_LCFIT:
LCFIT_OPTS: &LCFIT_OPTS
SNLCINP:
USE_MINOS: F
This will be preprocessed to produce the following yaml file, which pippin will then run on.
final_pippin_input.yml
# Original input file: path/to/base_job_file.yml
# Values surround by % indicate preprocessing steps.
# The preprocess below will copy the provided yml files into this one before this one is read in, allowing anchors to propegate into this file
# They will be copied in, in the order you specify, with duplicate tasks merging.
# Note that whitespace before or after the % is fine, as long as % is the first and last character.
# Anchors included from path/to/anchors_sim.yml
ANCHORS_SIM:
SIM_GLOBAL: &SIM_GLOBAL
W0_LAMBDA: -1.0
OMEGA_MATTER: 0.3
NGEN_UNIT: 0.1
# Anchors included from path/to/anchors_lcfit.yml
ANCHORS_LCFIT:
LCFIT_OPTS: &LCFIT_OPTS
SNLCINP:
USE_MINOS: F
SIM:
DESSIM:
IA_G10_DES3YR:
BASE: surveys/des/sims_ia/sn_ia_salt2_g10_des3yr.input
GLOBAL:
<<: *SIM_GLOBAL
LCSIM:
IA_G10_LOWZ:
BASE: surveys/lowz/sims_ia/sn_ia_salt2_g10_lowz.input
GLOBAL:
<<: *SIM_GLOBAL
LCFIT:
LS:
BASE: surveys/lowz/lcfit_nml/lowz.nml
MASK: DATALOWZ
FITOPTS: surveys/lowz/lcfit_fitopts/lowz.yml
<<: *LCFIT_OPTS
DS:
BASE: surveys/des/lcfit_nml/des_3yr.nml
MASK: DATADES
FITOPTS: surveys/des/lcfit_fitopts/des.yml
<<: *LCFIT_OPTS
Now you can include the anchors_sim.yml
and anchors_lcfit.yml
anchors in any pippin job you want, and need only update those anchors once. There are a few caveats to this to be aware of. The preprocessing does not checking to ensure the given file is valid yaml, it simply copies the yaml directly in. As such you should always ensure that the name of your anchor block is unique, any duplicates will mean whichever block is lowest will overwrite all other blocks of the same name. Additionally, whilst you could technically use this to store Pippin task blocks in external yml files, this is discouraged as this feature was only intended for anchors and aliases.
Use external results
Often times you will want to reuse the results of one Pippin job in other Pippin jobs, for instance reusing a biascor sim so you don’t need to resimulate every time. This can be accomplished via the EXTERNAL
and EXTERNAL_DIR
keywords.
The EXTERNAL
keyword is used when you only need to specify a single external result, such as when you are loading in a simulation. If that’s the case you simply need to let Pippin know where the external results are located. An example loading in external biascor sims is below:
SIM:
DESSIMBIAS5YRIA_C11:
EXTERNAL: $PIPPIN_OUTPUT/GLOBAL/1_SIM/DESSIMBIAS5YRIA_C11
DESSIMBIAS5YRIA_G10:
EXTERNAL: $PIPPIN_OUTPUT/GLOBAL/1_SIM/DESSIMBIAS5YRIA_G10
DESSIMBIAS5YRCC:
EXTERNAL: $PIPPIN_OUTPUT/GLOBAL/1_SIM/DESSIMBIAS5YRCC
The EXTERNAL_DIRS
keyword is used when there isn’t a one-to-one mapping between the task the external results. An example of this is a lightcurve fitting task where a single task will fit multiple lightcurves. If this is the case, you can specify a number of external results using the EXTERNAL_DIRS
keyword:
LCFIT:
D:
BASE: surveys/des/lcfit_nml/des_5yr.nml
MASK: DESSIM
EXTERNAL_DIRS:
- $PIPPIN_OUTPUT/GLOBAL/2_LCFIT/D_DESSIMBIAS5YRIA_C11
- $PIPPIN_OUTPUT/GLOBAL/2_LCFIT/D_DESSIMBIAS5YRIA_G10
- $PIPPIN_OUTPUT/GLOBAL/2_LCFIT/D_DESSIMBIAS5YRCC
Note that in this case the name of the external results matches the name of the task. Any tasks which do not have an exact match in EXTERNAL_DIRS
are run as normal, allowing you to mix and match both precomputed and non-precomputed tasks together.
If you have external results which don’t have an exact match but should still be used, you can specify how the external results should be used via the EXTERNAL_MAP
keyword:
LCFIT:
D:
BASE: surveys/des/lcfit_nml/des_5yer.nml
MASK: DESSIM
EXTERNAL_DIRS:
- $PIPPIN_OUTPUT/EXAMPLE_C11/2_LCFIT/DESFIT_SIM
- $PIPPIN_OUTPUT/EXAMPLE_G10/2_LCFIT/DESFIT_SIM
- $PIPPIN_OUTPUT/EXAMPLE/2_LCFIT/DESFIT_CCSIM
EXTERNAL_MAP:
# LCFIT_SIM: EXTERNAL_MASK
D_DESSIMBIAS5YRIA_C11: EXAMPLE_C11 # In this case we are matching to the pippin job name, as the LCFIT task name is shared between two EXTERNAL_DIRS
D_DESSIMBIAS5YRIA_G10: EXAMPLE_G10 # Same as C11
D_DESSIMBIAS5YRCC: DESFIT_CCSIM # In this case we match to the LCFIT task name, as the pippin job name (EXAMPLE) would match with the other EXTERNAL_DIRS
Changing SBATCH options
Pippin has sensible defaults for the sbatch options of each task, however it is possible you may sometimes want to overwrite some keys, or even replace the sbatch template entirely. You can do this via the BATCH_REPLACE
, and BATCH_FILE
options respectively.
In order to overwrite the default batch keys, add the following to any task which runs a batch job:
BATCH_REPLACE:
REPLACE_KEY1: value
REPLACE_KEY2: value
Possible options for BATCH_REPLACE
are:
REPLACE_NAME
:--job-name
REPLACE_LOGFILE
:--output
REPLACE_WALLTIME
:--time
REPLACE_MEM
:--mem-per-cpu
Note that changing these could have unforseen consequences, so use at your own risk.
If replacing these keys isn’t enough, you are able to create you own sbatch templates and get Pippin to use them. This is useful if you want to change the partition, or add some additional code which runs before the Pippin job. Note that your template must contain the keys listed above in order to work properly. In addition you must have REPLACE_JOB
at the bottom of your template file, otherwise Pippin will not be able to load it’s jobs into your template. An example template is as follows:
#!/bin/bash
#SBATCH -p broadwl-lc
#SBATCH --account=pi-rkessler
#SBATCH --job-name=REPLACE_NAME
#SBATCH --output=REPLACE_LOGFILE
#SBATCH --time=REPLACE_WALLTIME
#SBATCH --nodes=1
#SBATCH --mem-per-cpu=REPLACE_MEM
echo $SLURM_JOB_ID starting execution `date` on `hostname`
REPLACE_JOB
To have Pippin use your template, simply add the following to your task:
BATCH_FILE: path/to/your/batch.TEMPLATE