3. CLASSIFICATION
Within Pippin, there are many different classifiers implemented. Most classifiers need to be trained, and can then run in predict mode. All classifiers that require training can either be trained in the same yml file, or you can point to an external serialised instance of the trained class and use that. The general syntax for a classifier is:
CLASSIFICATION:
SOMELABEL:
CLASSIFIER: NameOfTheClass
MODE: train # or predict
MASK: mask # Masks both sim and lcfit together, logical and, optional
MASK_SIM: sim_only_mask
MASK_FIT: lcfit_only_mask
COMBINE_MASK: [SIM_IA, SIM_CC] # optional mask to combine multiple sim runs into one classification job (e.g. separate CC and Ia sims). NOTE: currently not compatible with SuperNNova/SNIRF
OPTS:
PROB_COLUMN_NAME: PROB_RENAME_THIS # give PROB_ name here; must start with PROB_
MODEL: file_or_label # only needed in predict mode, how to find the trained classifier
OPTIONAL_MASK: opt_mask # mask for optional dependencies. Not all classifiers make use of this
OPTIONAL_MASK_SIM: opt_sim_only_mask # mask for optional sim dependencies. Not all classifiers make use of this
OPTIONAL_MASK_FIT: opt_lcfit_only_mask # mask for optional lcfit dependencies. Not all classifiers make use of this
WHATREVER_THE: CLASSIFIER_NEEDS
SCONE Classifier
The SCONE classifier is a convolutional neural network-based classifier for supernova photometry. The model first creates “heatmaps” of flux values in wavelength-time space, then runs the neural network model on GPU (if available) to train or predict on these heatmaps. A successful run will produce predictions.csv, which shows the Ia probability of each SN. For debugging purposes, the model config (model_config.yml), Slurm job (job.slurm), log (output.log), and all the heatmaps (heatmaps/) can be found in the output directory. An example of how to define a SCONE classifier:
CLASSIFICATION:
SCONE_TRAIN: # Helen's CNN classifier
CLASSIFIER: SconeClassifier
MODE: train
OPTS:
GPU: True # OPTIONAL, default: False
# HEATMAP CREATION OPTS
CATEGORICAL: True # OPTIONAL, binary or categorical classification, default: False
NUM_WAVELENGTH_BINS: 32 # OPTIONAL, heatmap height, default: 32
NUM_MJD_BINS: 180 # OPTIONAL, heatmap width, default: 180
REMAKE_HEATMAPS: False # OPTIONAL, SCONE does not remake heatmaps unless the 3_CLAS/heatmaps subdir doesn't exist or if this param is true, default: False
# MODEL OPTS
NUM_EPOCHS: 400 # REQUIRED, number of training epochs
IA_FRACTION: 0.5 # OPTIONAL, desired Ia fraction in train/validation/test sets for binary classification, default: 0.5
SCONE_PREDICT: # Helen's CNN classifier
CLASSIFIER: SconeClassifier
MODE: predict
OPTS:
GPU: True # OPTIONAL, default: False
# HEATMAP CREATION OPTS
CATEGORICAL: True # OPTIONAL, binary or categorical classification, default: False
NUM_WAVELENGTH_BINS: 32 # OPTIONAL, heatmap height, default: 32
NUM_MJD_BINS: 180 # OPTIONAL, heatmap width, default: 180
REMAKE_HEATMAPS: False # OPTIONAL, SCONE does not remake heatmaps unless the 3_CLAS/heatmaps subdir doesn't exist or if this param is true, default: False
# MODEL OPTS
MODEL: "/path/to/trained/model" # REQUIRED, path to trained model that should be used for prediction
IA_FRACTION: 0.5 # OPTIONAL, desired Ia fraction in train/validation/test sets for binary classification, default: 0.5
SuperNNova Classifier
The SuperNNova classifier is a recurrent neural network that operates on simulation photometry. It has three in vuilt variants - its normal (vanilla) mode, a Bayesian mode and a Variational mode. After training, a model.pt can be found in the output directory, which you can point to from a different yaml file. You can define a classifier like so:
CLASSIFICATION:
SNN_TEST:
CLASSIFIER: SuperNNovaClassifier
MODE: predict
GPU: True # Or False - determines which queue it gets sent into
CLEAN: True # Or false - determine if Pippin removes the processed folder to sae space
OPTS:
MODEL: SNN_TRAIN # Havent shown this defined. Or /somepath/to/model.pt
VARIANT: vanilla # or "variational" or "bayesian". Defaults to "vanilla"
REDSHIFT: True # What redshift info to use when classifying. Defaults to 'zspe'. Options are [True, False, 'zpho', 'zspe', or 'none']. True and False are legacy options which map to 'zspe', and 'none' respectively.
NORM: cosmo_quantile # How to normalise LCs. Other options are "perfilter", "cosmo", "global" or "cosmo_quantile".
CYCLIC: True # Defaults to True for vanilla and variational model
SEED: 0 # Sets random seed. Defaults to 0.
LIST_FILTERS: ['G', 'R', 'I', 'Z'] # What filters are present in the data, defaults to ['g', 'r', 'i', 'z']
SNTYPES: "/path/to/sntypes.txt" # Path to a file which lists the sn type mapping to be used. Example syntax for this can be found at https://github.com/LSSTDESC/plasticc_alerts/blob/main/Examples/plasticc_schema/elasticc_origmap.txt. Alternatively, yaml dictionaries can be used to specify each sn type individually.
Pippin also allows for SuperNNova input yaml files to be passed, instead of having to define all of the options in the Pippin input yaml. This is done via:
OPTS:
DATA_YML: path/to/data_input.yml
CLASSIFICATION_YML: path/to/classification_input.yml
Example input yaml files can be found here, with the important variation that you must have:
raw_dir: RAW_DIR
dump_dir: DUMP_DIR
done_file: DONE_FILE
So that Pippin can automatically replace these with the appropriate directories.
SNIRF Classifier
The SNIRF classifier is a random forest running off SALT2 summary statistics. You can specify which features it gets to train on, which has a large impact on performance. After training, there should be a model.pkl in the output directory. You can specify one like so:
CLASSIFICATION:
SNIRF_TEST:
CLASSIFIER: SnirfClassifier
MODE: predict
OPTS:
MODEL: SNIRF_TRAIN
FITOPT: some_label # Optional FITOPT to use. Match the label. Defaults to no FITOPT
FEATURES: x1 c zHD x1ERR cERR PKMJDERR # Columns to use. Defaults are shown. Check FITRES for options.
N_ESTIMATORS: 100 # Number of trees in forest
MIN_SAMPLES_SPLIT: 5 # Min number of samples to split a node on
MIN_SAMPLES_LEAF: 1 # Minimum number samples in leaf node
MAX_DEPTH: 0 # Max depth of tree. 0 means auto, which means as deep as it wants.
Nearest Neighbour Classifier
Similar to SNIRF, NN trains on SALT2 summary statistics using a basic Nearest Neighbour algorithm from sklearn. It will produce a model.pkl file in its output directory when trained. You can configure it as per SNIRF:
CLASSIFICATION:
NN_TEST:
CLASSIFIER: NearestNeighborPyClassifier
MODE: predict
OPTS:
MODEL: NN_TRAIN
FITOPT: some_label # Optional FITOPT to use. Match the label. Defaults to no FITOPT
FEATURES: zHD x1 c cERR x1ERR COV_x1_c COV_x1_x0 COV_c_x0 PKMJDERR # Columns to use. Defaults are shown.
Perfect Classifier
Sometimes you want to cheat, and if you have simulations, this is easy. The perfect classifier looks into the sims to get the actual type, and will then assign probabilities as per your configuration. This classifier has no training mode, only predict.
CLASSIFICATION:
PERFECT:
CLASSIFIER: PerfectClassifier
MODE: predict
OPTS:
PROB_IA: 1.0 # Probs to use for Ia events, default 1.0
PROB_CC: 0.0 # Probs to use for CC events, default 0.0
Unity Classifier
To emulate a spectroscopically confirmed sample, or just to save time, we can assign every event a probability of 1.0 that it is a type Ia. As it just returns 1.0 for everything, it only has a predict mode
CLASSIFICATION:
UNITY:
CLASSIFIER: UnityClassifier
MODE: predict
FitProb Classifier
Another useful debug test is to just take the SALT2 fit probability calculated from the chi2 fitting and use that as our probability. You’d hope that classifiers all improve on this. Again, this classifier only has a predict mode.
CLASSIFICATION:
FITPROBTEST:
CLASSIFIER: FitProbClassifier
MODE: predict