UR: Use of Machine Learning Techniques to Analyze Radial Velocity Data to Find Exoplanets

The Undergraduate Research series is where we feature the research that you’re doing. If you are an undergraduate that took part in an REU or similar astro research project and would like to share this on Astrobites, please check out our submission page for more details. We would also love to hear about your more general research experience!


Rohan Nagavardhan

Ward Melville High School.

This post was written by Rohan Nagavardhan, a high school student at Ward Melville High School with a passion for astronomy & computer science. He completed this research at the SUNY Stony Brook University during the summer under the supervision of Dr. Praveen Tripathi. He plans to present these results at Regeneron STS.

In 1995, the radial velocity method found the first extrasolar planet around a Sun-like star in the constellation Pegasus. This method uses the motion of a planet’s home star around a center of mass in order to detect the planet (see figure below) and has been responsible for the detection of a vast number of planets over the past decades. NASA and other groups such as European Southern Observatory (ESO) and the Sloan Digital Sky Survey (SDSS) are amassing large catalogs of radial velocity data for stars with instruments such as High Accuracy Radial Velocity Planet Searcher (HARPS) and Multi-object APO Radial Velocity Exoplanet Large-area Survey (MARVELS). 

We built a machine learning classifier to analyze radial velocity time series data of a star and predict whether that star has 0, 1, or more planets orbiting it. Our radial velocity data set was comprised of data from the NASA Exoplanet Archive. Machine learning models are unable to look at time series data (data that measures a certain quantity over a certain period of time i.e. monitoring the value of a stock over 5 years) and form a prediction; therefore, I needed to conduct feature extraction. Feature extraction describes each unique radial velocity data set with a specific set of orbital parameters that point to that specific star’s data. Repeatedly performing this method with the radial velocity data yielded a data set which would be used to train a model and test the performance of the model. There are two ways of measuring performance of a machine learning model: F-measure score and accuracy percentages. F-measure score analyzes how precise the model is and how well it is able to recall information in order to create one metric that measures the performance of the model. Accuracy percentage, which calculates the number of correct predictions over all predictions made, is not always the best way to measure the performance of a model, was also utilized to supplement the F-measure scores. After training, the model received an accuracy percentage of 81.20%; furthermore, the model received an F-measure score of .81. These results substantiate the use of machine learning methods on radial velocity time series data for the purpose of exoplanet identification. 

Illustration of the radial velocity method for finding exoplanets. Light from stars moving towards us are blue shifted and when moving away from us are redshifted. If this occurs periodically and at the right velocity this may indicate the gravitational effect of an exoplanet on its star
The radial velocity method for finding exoplanets. Image credit: Johan Jarnestad/The Royal Swedish Academy of Sciences

Astrobite edited by: Haley Wahl

Featured image credit: Johan Jarnestad/The Royal Swedish Academy of Sciences

About Guest

This post was written by a guest author. If you're interested in writing a guest post for Astrobites, please contact us.

Leave a Reply