Pippin Development
Issues and Contributing to Pippin
Contributing to Pippin or raising issues is easy. Here are some ways you can do it, in order of preference:
Submit an issue on Github, and then submit a pull request to fix that issue.
Submit an issue on Github, and then wait until I have time to look at it. Hopefully thats quickly, but no guarantees.
Email me with a feature request
If you do want to contribute code, fantastic. Please note that all code in Pippin is subject to the Black formatter. I would recommend installing this yourself because it’s a great tool.

Coding style
Please, for the love of god, don’t code this up in vim/emacs on a terminal connection[1]. Use a proper IDE (I recommend PyCharm or VSCode), and install the Black extension! I have Black set up in PyCharm as a file watcher, and all python files, on save, are automatically formatted. Use 160 characters a linewidth. Here is the Black file watcher config:

If everyone does this, then all files should remain consistent across different users.
Testing valid config in Pippin
To ensure we don’t break things when pushing out new code, the tests directory contains a set of tests progressively increasing in pipeline complexity, designed to ensure that existing config files act consistently regardless of code changes. Any failure in the tests means a break in backwards compatibility and should be discussed before being incorporated into a release.
To run the tests, in the top level directory, simply run:
pytest -v .
Adding a new task
Alright there, you want to add a new task to Pippin? Great. Here’s what you’ve got to do:
Create an implementation of the
Taskclass, can keep it empty for now.Figure out where it goes - in
manager.pyat the top you can see the current stages in Pippin. You’ll probably need to figure out where it should go. Once you have figured it out, import the task and slot it in.Back in your new class that extends Task, you’ll notice you have a few methods to implement:
_run(): Kick the task off, report True or False for successful kicking off. To help with determining the hash and whether the task shoudl run, there are a few handy functions:_check_regenerate,get_hash_from_string,save_hash,get_hash_from_files,get_old_hash. See, for example, the 9. ANALYSE task for an example on how I use these._check_completion(squeue): Check to see if the task (whether its being rerun or not) is done. Normally I do this by checking for a done file, which contains either SUCCESS or FAILURE. For example, if submitting a script to a queuing system, I might have this after the primary command:if [ $? -eq 0 ]; then echo SUCCESS > {done_file} else echo FAILURE > {done_file} fi
This allows me to easily see if a job failed or passed. On failure, I then generally recommend looking through the task logs and trying to figure out what went wrong, so you can present a useful message to your user. To then show that error, or ANY MESSAGE TO THE USER, use the provided logger:
self.logger.error("The task failed because of this reason").This method should return either a) Task.FINISHED_FAILURE, Task.FINISHED_SUCCESS, or alternatively the number of jobs still in the queue, which you could figure out because I pass in all jobs the user has active in the variable squeue (which can sometimes be None).
get_tasks(task_config, prior_tasks, output_dir, stage_num, prefix, global_config): From the given inputs, determine what tasks should be created, and create them, and then return them in a list. For context, here is the code I use to determine what simulation tasks to create:@staticmethod def get_tasks(config, prior_tasks, base_output_dir, stage_number, prefix, global_config): tasks = [] for sim_name in config.get("SIM", []): sim_output_dir = f"{base_output_dir}/{stage_number}_SIM/{sim_name}" s = SNANASimulation(sim_name, sim_output_dir, f"{prefix}_{sim_name}", config["SIM"][sim_name], global_config) Task.logger.debug(f"Creating simulation task {sim_name} with {s.num_jobs} jobs, output to {sim_output_dir}") tasks.append(s) return tasks
Adding a new classifier
Alright, so what if we’re not after a brand new task, but just adding another classifier. Well, its easier to do, and I recommend looking at
nearest_neighbor_python.py for something to copy from. You’ll see we have the parent Classifier class, I write out the slurm script that
would be used, and then define the train and predict method (which both invoke a general classify function in different ways, you can do this
however you want.)
You’ll also notice a very simply _check_completion method, and a get_requirmenets method. The latter returns a two-tuple of booleans, indicating
whether the classifier needs photometry and light curve fitting results respectively. For the NearestNeighbour code, it classifies based
only on SALT2 features, so I return (False, True).
You can also define a get_optional_requirements method which, like get_requirements, returns a two-tuple of booleans, indicating whether the classifer needs photometry and light curve fitting results for this particular run. By default, this method returns:
True, TrueifOPTIONAL_MASKset inOPTSTrue, FalseifOPTIONAL_MASK_SIMset inOPTSFalse, TrueifOPTIONAL_MASK_FITset inOPTSFalse, Falseotherwise.
If you define your own method based on classifier specific requirements, then these OPTIONAL_MASK* keys can still be set to choose which tasks are optionally included. If there are not set, then the normal MASK, MASK_SIM, and MASK_FIT are used instead. Note that if no masks are set then every sim or lcfit task will be included.
Finally, you’ll need to add your classifier into the ClassifierFactory in classifiers/factory.py, so that I can link a class name
in the YAML configuration to your actual class. Yeah yeah, I could use reflection or dynamic module scanning or similar, but I’ve had issues getting
the behaviour consistent across systems and conda environments, so we’re doing it the hard way.