Cosmic Dawn at the Galaxy Zoo

Title: Galaxy Zoo: Cosmic Dawn – morphological classifications for over 41 000 galaxies in the Euclid Deep Field North from the Hawaii Two-0 Cosmic Dawn survey

Authors: James Pearson, Hugh Dickinson, Stephen Serjeant, Mike Walmsley, Lucy Fortson, Sandor Kruk, Karen L. Masters, Brooke D. Simmons, R. J. Smethurst, Chris Lintott, Lukas Zalesky, Conor McPartland, John R. Weaver, Sune Toft, Dave Sanders, Nima Chartab, Henry Joy McCracken, Bahram Mobasher, Istvan Szapudi, Noah East, Wynne Turner, Matthew Malkan, William J. Pearson, Tomotsugu Goto, Nagisa Oi

First Author’s Institution: School of Physical Sciences, The Open University, Milton Keynes, MK7 6AA, UK

Status: Submitted to MNRAS [open access]

There are many different ways to study a galaxy, but one of the most important might also be the most straightforward: simply looking at it. (Or if you’d prefer a more scientific way of putting it: analyzing the galaxy’s visual morphology, cough cough.) A galaxy’s morphology can reveal how it has grown and changed over time, like the formation of disk and spiral structures, the assembly of galactic bulges and bars, the activity of its central supermassive black hole, and its merger history. Looking at a galaxy can also reveal rare features like gravitational lenses!

But with the latest sky surveys collecting images of millions of galaxies (like the Euclid survey or the Vera Rubin Observatory’s Legacy Survey of Space and Time), how can astronomers ever find the time to inspect each galaxy in detail? Thanks to Galaxy Zoo, astronomers don’t have to do this alone: volunteer citizen scientists are up to the task.

Today’s authors present the results of Galaxy Zoo: Cosmic Dawn, a citizen-science project that invited the public to classify galaxies in Hyper Suprime-Cam (HSC) images from the multiwavelength Cosmic Dawn survey. The project launched online in October 2022 and ran for six months. During that time, tens of thousands of volunteers submitted over four million galaxy classifications to the website. These classifications are scientifically important on their own for studying galaxy evolution, and they can also be used to train machine learning models to rapidly classify galaxy images from upcoming surveys.

How does Galaxy Zoo work?

When a volunteer visits the Galaxy Zoo: Cosmic Dawn website, they’re shown a color image of one galaxy at a time and asked a series of multiple-choice questions. The first question is, “Is the galaxy simply smooth and rounded, with no sign of a disk?” From there on, the questions get more and more specific, like the number of spiral arms, the size of the central bulge, and whether the galaxy is merging or disturbed. The full flowchart of questions is shown in Figure 1.

Figure 1: A flowchart of the questions volunteers answer in Galaxy Zoo: Cosmic Dawn. Galaxy Zoo also provides volunteers with a tutorial and a field guide to help them answer these questions. Figure 4 in today’s paper.

Galaxy Zoo: Cosmic Dawn used a total dataset of 47,347 galaxy images. These images were then split into two phases of classification. In the first phase, only 16,671 of the images were shown to volunteers. After an image was classified by 40 volunteers, it was considered fully classified and retired from the website. 

The classifications from this first phase were then used to fine-tune the training of Galaxy Zoo’s deep learning model called Zoobot. In the second phase of classification, the remaining 30,676 images were collaboratively classified by volunteers and Zoobot.

Figure 2: Example images from Galaxy Zoo: Cosmic Dawn. In response to the website’s first question, volunteers classified the left images as “Features or Disk” and classified the right images as “Star, Artifact, or Bad Zoom”. Adapted from Figures 9 and 10 in today’s paper.

Amazingly, the volunteers identified 51 previously-undiscovered gravitational lenses in the data. These are pretty rare, and they’re exciting to study because they allow us to see objects that are much further away or much smaller than we would normally be able to see.

But even the non-object classifications made by volunteers were super useful. When volunteers selected the “Non-star Artifact” option, they were directed to select whether the image showed a “Saturation Feature (Bleed Trail)”, “Diffraction Spike”, “Satellite Trail”, “Cosmic Ray”, “Scattered Light”, or “Other / Not Sure”. Today’s authors found that the vast majority of the faulty images had scattered light or saturation features, but almost no cases of the other artifact types. This tells us which kinds of artifacts the automated image reduction pipeline is good at handling, and which types of artifacts we still need to work on.

Volunteer responses can be used to train machine learning models to classify astronomical images more quickly and more accurately

The Zoobot deep learning model had already been trained on galaxy classifications from previous Galaxy Zoo campaigns, but the volunteer classifications from the first phase of Galaxy Zoo: Cosmic Dawn were used to further fine-tune its training and improve its ability to handle deep imaging.

In phase 2 of the campaign, Zoobot was asked to classify galaxy images alongside volunteers. For each image, Zoobot followed the decision tree in Figure 1 and predicted the fraction of volunteers that would choose each option. If it predicted that less than 20% of volunteers would select the “Features or Disk” option in question 1, Zoobot retired the image and marked it as classified. This allowed Zoobot to handle the simplest galaxies so that volunteers could spend their time engaging with the most interesting ones.

During each week of phase 2, Zoobot was updated by retraining on new volunteer classifications, and it continued to retire images of simple galaxies. Compared to the volunteer-only classifications in phase 1, this iterative use of Zoobot sped up the galaxy classification process by nearly a factor of three!

Zoobot was pretty good at predicting volunteer classifications, but it needs more training data to accurately handle rare cases

After Galaxy Zoo: Cosmic Dawn concluded, Zoobot was run one final time on all of the images. For nearly every question that received at least 5 volunteer responses, Zoobot was able to predict those responses within an error range of 0.12 sigma. However, it struggled to predict answers for the least-common responses, since those options didn’t appear often enough in the training data. 

In general, Zoobot also didn’t do as well with subjective questions, like the shapes of edge-on galaxy bulges and the tightness of galaxy spiral arms. The answers to these questions aren’t always clear cut, so it makes sense that Zoobot would have a hard time predicting how volunteers would respond.

Zoobot didn’t predict answers for the “Do you see any of these rare features?” question at the end of the flowchart, since that question allows volunteers to select multiple options.

As astronomers begin dealing with larger and larger datasets, citizen science and machine learning models can work together

Citizen science projects like Galaxy Zoo: Cosmic Dawn are a great way to accurately classify astronomical images and discover rare objects. When combined with the use of machine learning models like Zoobot, these campaigns can analyze enormous amounts of data in a fraction of the time.

Astrobite edited by Niloofar Sharei

Featured image credit: Galaxy Zoo team

Author

  • Anavi Uppal

    I’m a second-year Astronomy & Astrophysics PhD student at the University of California, Santa Cruz. I’m interested in using machine learning and telescope surveys to explore a variety of topics in extragalactic astronomy. Beyond research, I love science outreach/journalism, photography, archery, and being outdoors.

    View all posts

Submit a Comment

Your email address will not be published. Required fields are marked *