UR: Using Machine Learning to Identify Transients in the DESI Survey

The Undergraduate Research series is where we feature the research that you’re doing. If you are an undergraduate that took part in an REU or similar astro research project and would like to share this on Astrobites, please check out our submission page for more details. We would also love to hear about your more general research experience!


Amanda Wasserman

University of Rochester

Amanda Wasserman is a senior undergraduate Physics and Astronomy major at the University of Rochester. She has accomplished this research working with Professor Segev BenZvi, and it will result in her senior thesis.

Over the next five years, the Dark Energy Spectroscopic Instrument (DESI) will observe the spectra of 35 million galaxies and quasars. By chance, a small percentage of these galaxies will contain supernovae and other transients that are visible in the galactic spectra. I have worked to develop machine learning tools to identify and classify transients in galaxy spectra measured with DESI. The goal of my research is to create a Transient Identification Pipeline that will automate the identification of contaminated spectra from plain galactic spectra. Classifying transient spectra will allow us to ensure correct estimates of the host redshifts (a measure of the distance to the galaxy) and notify fellow collaborations of the astrophysical phenomena as they occur. 

The algorithm we created is a multilabel convolutional neural network (CNN), a method that classifies inputs based on features in the data, with four layers that trained on a variety of simulated supernovae and hosts. The CNN inputs preprocessed spectra and outputs its most likely classification between hosts and supernovae including type Ia, type Ib, type Ic, type IIn, and type IIp. The classifier performs exceptionally well on our simulated data. When looking at the spectra that the CNN classified with high certainty, we attain an accuracy of over 99%. 

Our next goal for the project is to incorporate anomaly detection into the classifier to potentially identify new astrophysical phenomena. Additionally, DESI has just started to observe again after a hiatus due to COVID-19. As data comes in, we will adjust our pipeline to accommodate any differences between our modeled spectra and observed spectra. We look forward to applying our pipeline to real data and expect to find over 1,000 transients per year.

Figure 1: The confusion matrix (a metric for analyzing the accuracy of our algorithm) for our validation set of simulated spectra. On the x-axis is the label that the classifier predicts the spectrum to be and the y-axis is what the spectrum truly is. The boxes along the diagonal show the fraction of spectra that have been correctly classified. As our confusion matrix is extremely diagonal, we see that our algorithm is accurately identifying transients and labeling them correctly. 

Astrobite edited by: Ellis Avallone

About Guest

This post was written by a guest author. If you're interested in writing a guest post for Astrobites, please contact us.

Discover more from astrobites

Subscribe to get the latest posts to your email.

Leave a Reply