Authors: David Schaurecker, Yin Li, Jeremy Tinker, Shirley Ho, and Alexandre Refregier
First Author’s Institution: Institute for Particle Physics and Astrophysics, ETH Zurich, Zurich
Status: preprint on arXiv
Large data, expensive simulations
It’s no secret that, cosmologically speaking, we are living in an age of big data. Thanks to amazing galaxy surveys such as DESI, EUCLID, DES, and Rubin, astronomers are mapping larger swatches of our Universe using fainter galaxies. In order for our theoretical understanding to keep up with the observations, we need to be able to compare those maps to corresponding simulated catalogs of faint galaxies—meaning we need larger simulation boxes with finer resolution. Furthermore, we need a lot of those simulations to explore different cosmological parameters and calculate statistics. That’s very computationally expensive!
In this paper, the authors present a less costly way of moving forward. The questions they seek to answer include:
- Can we run a lower-resolution simulation instead, and then fill in the blanks with machine learning at the very end?
- How is that different compared to just running a large resolution simulation?
Read on to find out!
Filling in the blanks with machine learning
The authors use something called a generative adversarial network (GAN): a machine learning framework where two neural networks—called the generator and the discriminator—act as adversaries competing in a zero-sum game. A flowchart of how a GAN works is shown in Figure 1. The task of the generator is to learn to produce a sample that looks just like its training set; the task of the discriminator is to tell the generator’s data apart from the real sample. The game is rigged: it only ends once the discriminator loses, aka the generator gets so good that the discriminator consistently produces 50/50 odds of the sample being real or fake. While this is sad news for the generator, it’s great news for the user—it means that we can now produce realistic-looking fake data!
Comparing generated and simulated galaxies
In the case of today’s paper, the authors compare two dark matter-only simulations from the Illustris suite: the high-resolution Illustris-2-Dark and low-resolution Illustris-3-Dark simulations. They consider only the present-day snapshot of the simulations and divide their volumes into 8 pieces: 6 to serve as a training set, and 2 for validation and testing. The GAN is then trained to take in a one of the 6 pieces of the low-resolution simulation and recreate the corresponding high-resolution version. Once training is complete, the GAN is tested on one of the remaining two simulaton slices that the neural net hasn’t seen before. The results are shown in Figure 2.
Overall, the GAN does an excellent job, at least visually—the middle and right panels of Figure 2 are practically indistinguishable by eye! The main difference between the low- and high-resolution data is the presence of small dark matter clusters, which the GAN is successful at recovering. However, note that the generated data doesn’t match the high-resolution simulation exactly—and we don’t expect it to! The neural net can’t recover information that was lost to the low resolution of the Illustris-3 simulation; but what it can do is make a guess at what the higher resolution might have looked like, statistically. Therefore, we wouldn’t expect a GAN-resolved simulation to reveal the exact and true location of a blob of dark matter, but we would expect statistical measurements over a chunk of the simulation to be realistic. And those statistical measurements are exactly what we compare to data! For example, we might use the power spectrum of the dark matter or the halo-mass function, as illustrated in Figure 3 below.
While machine learning can’t recover lost data, a properly trained neural net can give us more resolving power in the statistics of the simulation. Furthermore, once it is trained, using a GAN to extrapolate to higher resolution is much faster and less computationally costly ran running a full high-resolution simulation. The ultimate goal would be to train a GAN so well that it can work on almost any dark matter-only simulation. In fact, the authors used their neural net on the newest suite of simulations Illustris simulations—called Illustris-TNG—and found that the network still managed to successfully predict new halos! This is a promising start to being able to create mock-catalogs with very realistic—if very fake—simulated data at a fraction of the real deal’s computation cost.
Astrobite edited by Lili Alderson
Featured image credit: Illustration by Sandbox Studio, Chicago with Corinne Mucha for Symmetry Magazine