Authors: D. Agarwal, K. Aggarwal, S. Burke-Spolaor, D. Lorimer, N. Garver-Daniels
First Author’s Institution: West Virginia University
Status: Submitted to MNRAS, open access on ArXiv
Fast radio bursts (FRBs) are currently one of the most mysterious objects in radio astronomy. They are extremely bright bursts of energy that last for only milliseconds and we currently have no idea how or why they happen, only that they seem to be coming from very far away. Identifying these bursts in a data set has proven to be a challenge, but a team at West Virginia University has developed a technique using neural networks and machine learning that could search through telescope data and detect candidates in real time.
What are FRBs and why are they so exciting?
FRBs are extremely bright, millisecond-duration bursts of energy that have a very high dispersion measure (dispersion measure essentially representing the number of free electrons between us and the object), which points to them being outside our galaxy. First discovered in 2007 by Duncan Lorimer, FRBs have quickly spread through the scientific community. In 2017, an FRB was localized to a small dwarf galaxy but since then, we have not been able to figure out exactly where any other ones are coming from. At least two FRBs have been seen to repeat, but astronomers are still looking for more repeaters.
Scientists have yet to pin down what causes these mysterious bursts. Some say that there are more theories about the origin of FRBs than there are FRBs. Current theories of what causes fast radio bursts include neutron star mergers, synchrotron maser emission, and the dark matter-induced collapse of neutron stars. With so many theories, one of the most useful things scientists can do at this point is to try to observe as many FRBs as possible in order to study them and try to figure out exactly what causes them.
How can machine learning help us detect more FRBs?
The process of picking out real FRBs from all of the data is one of the biggest hurdles. A typical FRB search pipeline includes correcting for dispersive delay over many trial dispersion measure values and averaging the frequency channels to generating a time series (first row of Figure 2). After this process is complete, candidates above a certain threshold are marked for visual inspection by a human. Real-time detectors have been installed on many radio telescopes but have a high false-positive rate due to radio frequency interference (RFI), which is essentially just noise. Thousands of candidates are therefore generated each day, most of which are RFI. Methods are put in place to try to reduce the number of candidates but none of these methods are able to classify the candidates as RFI, FRBs, or pulsars. Machine learning is a technique that this team hopes to implement in order to help distinguish potential FRBs from interference, leaving only the very plausible candidates. The technique has been previously discussed, but these authors hope to expand upon techniques previously used.
Machine learning allows computer systems to “learn” from data without actually being programmed to do so. Artificial neural networks are an important part of machine learning; they are modeled after biological neural networks and are a type of algorithm that is made up of different types of layers. Each layer is comprised of “neurons” connected to the output of the previous layer. This team has employed a type of neural network called a convolutional neural network (CNN), which is used for working with images. Figure 1 shows a schematic of a CNN. These CNNs are trained using labelled data in order to learn the features required for identification. By working through a set of layers, features are picked out from the images and are used to put them into different categories (RFI or FRB). This method has been successful in many areas of astronomy such as identifying Type Ia supernovae, detecting galaxy mergers, and classifying galaxies.
Figure 1: Simplified schematic representation of a CNN architecture. Neurons, which perform mathematical operations, are represented by circles and the connections between them by the arrow. The green circles are the input of the network and the red are the output.
How did the team use these models and did they work?
The goal of the team was to employ CNN algorithms to better identify FRBs and minimize the amount of RFI mistaken as FRB candidates. They used data taken with the Green Bank Telescope. Figure 2 shows the different types of images the algorithms need to sort through; they look very similar but the challenge is to distinguish the real FRB from the RFI and the pulsars. To help the algorithms learn, they injected simulated FRBs into some of the data and used large datasets to train different models to identify candidates.
They picked the top 11 models based on success rate in classifying candidates and evaluated the performance of them on independent sets of FRB data (all of these models are open source and available here). By doing this, they were able to test how well the models performed on data from different telescopes. Each telescope has different instrumental effects and a different RFI environment so making sure the algorithms work on each telescope is important.
The accuracy and recall of the models trained on the GBT was over 99.5%. The majority of them detected all of the real FRBs from ASKAP (the Australian Square Kilometre Array Pathfinder), the Parkes telescope, and the repeating FRB 121102. The authors believe these results are very encouraging for the field of fast radio bursts because they can help sift through a data set and detect FRBs with much higher accuracy. If these algorithms were implemented on telescopes, they would help detect FRBs in real time and tell other telescopes promptly to follow them up at different wavelengths. The team hopes to implement these models on datasets across the world in order to identify all of the possible bursts and make strides toward finally understanding fast radio bursts.
Figure 2: All of the different types of images in which the models are trained on. The top row is a time-series, which shows how much power the burst gives off during the burst (it is not part of the algorithm but is provided here for reference). Column A is a simulated FRB with background data from the Green Bank Telescope, Column B is an RFI candidate, and Column C is a pulsar.