Authors: David O’Ryan, Pablo Gómez
First Author’s Institution: European Space Agency
Status: Available on arXiv
Today’s astronomers live in a golden age of data — the James Webb Space Telescope (JWST) keeps dazzling us with images of the infrared universe, the Euclid mission is starting to reveal new info on dark matter and dark energy, and the Vera C. Rubin Observatory’s Legacy Survey of Space and Time (LSST) will soon begin taking deep images of the southern sky.
But while astronomers have no shortage of data, we do have a shortage of time. LSST alone will generate ~60 petabytes over its 10-year runtime, which is equivalent to nearly 400,000 times every article in English-language Wikipedia combined. How can we possibly look through so much data?
Today’s authors help solve a piece of this puzzle: they’ve created a machine learning tool called AnomalyMatch that can quickly identify anomalies in telescope imaging. Anomalies are anything that look different from ‘normal’ galaxies, such as gravitational lenses, black hole jets, merging galaxies, and more. In today’s study, AnomalyMatch analyzed 99.6 million Hubble images of galaxies in just 2-3 days and identified 1,339 anomalies!
How do you train an algorithm to look for anomalies?
Many machine learning algorithms need large ‘training sets’ of correctly-labeled data (in this case, labeled images of normal galaxies and anomalous galaxies) so they can learn how to correctly interpret unlabeled data. But in astronomy, we often want to find more of things that we don’t have many examples of. For instance, the authors’ original goal in creating AnomalyMatch was to find edge-on protoplanetary disks, but there are only a handful of examples of these in Hubble imaging.
To get around this, AnomalyMatch uses a combination of active learning and semi-supervised learning (SSL) methods to learn how to classify images from a limited training set. These terms are pretty opaque, so let’s break them down with an analogy.
Let’s say that you’re studying for a physics exam. You do all the problems on the practice test and you check your work against the answer key. If you got any of the questions incorrect, you can use the answer key to figure out where you went wrong. This is part of how SSL works in AnomalyMatch: the algorithm looks at the pre-labeled images you told it to study, and makes sure that it could arrive at the same labels on its own.
Now, let’s say that you decided to study more for your midterm. This time, you’re going to change tiny parts of the example problems and make sure that you can still get the right answer. AnomalyMatch does something similar. It modifies already-labeled images (by flipping them, cropping them, etc.) and then makes sure that it still gives them the correct label. This is the other half of how SSL works in AnomalyMatch.
Okay, now you decide to go to office hours to ask your TA some questions about the exam material. With the active learning technique, AnomalyMatch is like the student and the user is the TA. AnomalyMatch decides which unlabeled images it really, really wants to know the correct answer to — these are the answers that it thinks it could learn the most from. The program then displays these images to the user and asks them to label them as normal galaxies or anomalies. You can see an example of the AnomalyMatch user interface in Figure 1.

More anomalies than they bargained for…
Originally, the authors wanted to create AnomalyMatch to search for edge-on protoplanetary disks. But during the active learning phase of training, they found that their algorithm was also flagging other types of unusual sources: galaxy mergers, overlapping galaxies, gravitational lenses and arcs, jellyfish galaxies, clumpy galaxies, ring galaxies, active galactic nuclei, and black hole jets. The authors decided to expand their definition of ‘anomaly’ to include these interesting objects.
After completing the training phase, the authors used AnomalyMatch on all Hubble images taken with the ACS camera and F814W (a red filter). This is data from over two decades of observations! AnomalyMatch gave each of the 99.6 million images an anomaly score from 0 to 1 and the authors picked the 5000 highest-ranked images to analyze further. These images corresponded to 1,339 unique objects.

The authors inspected all 1,339 images by eye and cross-referenced them with objects in the literature in order to sub-classify the anomalies as galaxy mergers, ring galaxies, etc. It’s important to note that these aren’t definitive labels — they would need follow-up observations to confirm them.

Excitingly, 65% of the detected anomalies don’t have any references in the literature. This means that AnomalyMatch was able to find interesting objects that other researchers haven’t found before. AnomalyMatch was even able to find things that it wasn’t trained on — it marked an image of a lensed quasar as an anomaly even though there weren’t any images of lensed quasars in its training dataset.
Because AnomalyMatch uses active learning, future users can make their own decisions about what they want to classify as ‘anomalous’. For example, users could decide to only classify images of gravitational lenses as being anomalies.
Today’s paper used AnomalyMatch on Hubble data, but Hubble doesn’t survey the entire sky — the images in this study came from targeted observations that other researchers requested. Imagine just how many anomalies it could find in a large survey like LSST! AnomalyMatch could be a useful tool for expanding tiny samples of interesting objects — and doing it fast.
Astrobite edited by Veronika Dornan
Featured image credit: NASA, ESA, and The Hubble Heritage Team STScI/AURA); NASA and The Hubble Heritage Team (AURA/STScI); European Space Agency; NASA, ESA, S. Bianchi (Università degli Studi Roma Tre University), A. Laor (Technion-Israel Institute of Technology), and M. Chiaberge (ESA, STScI, and JHU); European Space Agency & NASA; Alan Glauco; The Hubble Heritage Team (AURA/STScI/NASA) NASA Headquarters – Greatest Images of NASA (NASA-HQ-GRIN); European Space Agency; ESA/Hubble & NASA, C. Kilpatrick