Adversarial Networks, Collaborative Cosmology

Title: Super-resolving Dark Matter Halos using Generative Deep Learning

Authors: David Schaurecker, Yin Li, Jeremy Tinker, Shirley Ho, and Alexandre Refregier

First Author’s Institution: Institute for Particle Physics and Astrophysics, ETH Zurich, Zurich

Status: preprint on arXiv

Large data, expensive simulations

It’s no secret that, cosmologically speaking, we are living in an age of big data. Thanks to amazing galaxy surveys such as DESI, EUCLID, DES, and Rubin, astronomers are mapping larger swatches of our Universe using fainter galaxies. In order for our theoretical understanding to keep up with the observations, we need to be able to compare those maps to corresponding simulated catalogs of faint galaxies—meaning we need larger simulation boxes with finer resolution. Furthermore, we need a lot of those simulations to explore different cosmological parameters and calculate statistics. That’s very computationally expensive!

In this paper, the authors present a less costly way of moving forward. The questions they seek to answer include:

  1. Can we run a lower-resolution simulation instead, and then fill in the blanks with machine learning at the very end?
  2. How is that different compared to just running a large resolution simulation?

Read on to find out!

Filling in the blanks with machine learning

The authors use something called a generative adversarial network (GAN): a machine learning framework where two neural networks—called the generator and the discriminator—act as adversaries competing in a zero-sum game. A flowchart of how a GAN works is shown in Figure 1. The task of the generator is to learn to produce a sample that looks just like its training set; the task of the discriminator is to tell the generator’s data apart from the real sample. The game is rigged: it only ends once the discriminator loses, aka the generator gets so good that the discriminator consistently produces 50/50 odds of the sample being real or fake. While this is sad news for the generator, it’s great news for the user—it means that we can now produce realistic-looking fake data!

A flow chart illustrating how a GAN works. On the top, we have the words X train (denoting training data) leading into one sample. On the bottom, we have "Random noise z" leading into G (the generator), which then leads into a sample. The two samples lead into D (the discriminator), which finally leads into a fake/real probability

Figure 1. Illustration of how a GAN works. The generator G is trained to take in some random noise and produce a sample that is indistinguishable from a separate sample taken from the training set data, Xtrain.  The two samples are input to the discriminator D, which produces a probability of the generated data being real. The GAN is trained once the discriminator starts producing 50/50 odds. Figure 1 in this paper.

Comparing generated and simulated galaxies

In the case of today’s paper, the authors compare two dark matter-only simulations from the Illustris suite: the high-resolution Illustris-2-Dark and low-resolution Illustris-3-Dark simulations. They consider only the present-day snapshot of the simulations and divide their volumes into 8 pieces: 6 to serve as a training set, and 2 for validation and testing. The GAN is then trained to take in a one of the 6 pieces of the low-resolution simulation and recreate the corresponding high-resolution version. Once training is complete, the GAN is tested on one of the remaining two simulaton slices that the neural net hasn’t seen before. The results are shown in Figure 2.

The figure shows three slices of the simulations in a row, each of width 37.5 Mpc/h and depicting yellowish-orange halos in a cosmic web over a navy background. Each also has an inset showing the zoom-in of a small region of 2.74 Mpc/h.

Figure 2: The left panel shows the low-resolution simulations, the middle shows the GAN-generated data, and the right shows the high-resolution simulation. Note that at large scales, the three simulations look indistinguishable. At smaller scales shown in the insets, the middle and right panel reveal more faint particles (corresponding to lower mass dark matter halos), though not necessarily in the exact same places. Figure 3 in paper. 

Overall, the GAN does an excellent job, at least visually—the middle and right panels of Figure 2 are practically indistinguishable by eye! The main difference between the low- and high-resolution data is the presence of small dark matter clusters, which the GAN is successful at recovering. However, note that the generated data doesn’t match the high-resolution simulation exactly—and we don’t expect it to! The neural net can’t recover information that was lost to the low resolution of the Illustris-3 simulation; but what it can do is make a guess at what the higher resolution might have looked like, statistically. Therefore, we wouldn’t expect a GAN-resolved simulation to reveal the exact and true location of a blob of dark matter, but we would expect statistical measurements over a chunk of the simulation to be realistic. And those statistical measurements are exactly what we compare to data! For example, we might use the power spectrum of the dark matter or the halo-mass function, as illustrated in Figure 3 below.

The figure shows the logged number of halos of a certain mass plotted against logged halo mass. All three lines grow to the left (aka, towards smaller mass halos), but the low-resolution data drops off around 10^10 solar masses, while the generated and high-resolution data drop off around 10^9 solar masses.

Figure 3: The authors calculate the halo mass functionaka, the number of halos of a certain mass as a function of the mass— for the low-resolution simulation (pink), GAN generated data (blue), and the high-resolution simulation (green). Note that the GAN and high-resolution halo mass functions look almost identical and extend to significantly lower masses than the low-resolution halo mass function.


While machine learning can’t recover lost data, a properly trained neural net can give us more resolving power in the statistics of the simulation. Furthermore, once it is trained, using a GAN to extrapolate to higher resolution is much faster and less computationally costly ran running a full high-resolution simulation. The ultimate goal would be to train a GAN so well that it can work on almost any dark matter-only simulation. In fact, the authors used their neural net on the newest suite of simulations Illustris simulations—called Illustris-TNG—and found that the network still managed to successfully predict new halos! This is a promising start to being able to create mock-catalogs with very realistic—if very fake—simulated data at a fraction of the real deal’s computation cost. 

Astrobite edited by Lili Alderson

Featured image credit: Illustration by Sandbox Studio, Chicago with Corinne Mucha for Symmetry Magazine

About Luna Zagorac

I am a PhD candidate in the Physics Department at Yale University. My research focus is ultra light (or fuzzy) dark matter in simulations and observations. I’m also a Franke Fellow in the Natural Sciences & Humanities at Yale working on a project on Egyptian archaeoastronomy, another passion of mine. When I’m not writing code or deciphering glyphs, I can usually be found reading, doodling, or drinking coffee.

Leave a Reply