The Moving Bits

Title: Radio Astronomy Data Transfer and eVLBI using KAREN
Authors: Stuart Weston, Timothy Natusch, Sergei Gulyaev
Affiliation: Auckland University of Technology

With increasingly large datasets and surveys, the problem of moving astronomical data around the world is non-trivial. Moving data can be done in two ways. First, one might put the data on a storage device (i.e. magnetic tapes, CDs, or hard drives) and then carry it down the mountain. A second option it is to store the data remotely and download it over the internet. But whether you move the data physically or electronically, how fast and how accurately can this be done?

The relative transfer speeds (green area) for 50 TCP flows (top) compared to 4 UDT flows (bottom). The relative green area for the UDT protocol is much larger than that of the TCP, showing the increased speed for this choice.

Unsurprisingly, the answer to this question depends on the method you use. I often find myself needing to move files from one computer to another and I will FTP (File Transfer Protocol) to do so. The FTP is a standard that defines how the streaming bits are to be delivered and synthesized, specifically for an TCP/IP network. You can read all about TCP/IP networks online, but the salient feature is that this protocol was developed to provide a reliable stream of data packets from one computer to another. Whenever you access the internet or your email, you are most probably using a TCP/IP protocol. However, TCP/IP is not efficient for high speed, wide area networks. These are networks that cover a broad geographic area where you don’t want performance to be dependent on location in the network. The UDP-based data transfer protocol (UDT) is an alternative to TCP/IP which reduces the latency (or delay in the file transfer) albeit with a reduction in reliability. Consequently, your choice in protocol depends on your network and speed and efficiency requirements.

One path the data took starting in New Zealand and ending in Perth with a stop in Metsähovi, Finland.

This paper tested the performance of FTP and several UDT protocols for the transfer of radio data for a network of telescopes. They sent the data through various paths (see figure above) using the different protocols. The application was for Very Long Baseline Interferometry (VLBI) which creates an array of radio dishes spread across the world to obtain exquisite resolution. One of the difficulties involved with this measurement is that the data from each single dish must be combined with all the others. While this can be done after the observation using stored data, high speed network connections enable this calculation to be done in real time, a method called e-VLBI. As you might imagine, the choice of file transfer protocol is a limiting factor in the ability to effectively redistribute single-dish data over the network. For this particular problem, the UDT protocol was much faster than the FTP (see table below). This result might not be too surprising considering that UDT is optimized for networks that span countries or even the globe. However, the stark difference in protocol performance illustrates the importance of understanding not only the what and why of a measurement, but also the how.

Data transfer results (table 2 from paper). From left to right, the columns are: a) the route the path took, b) the protocol used, c) the size of the test data, d) the amount of time the transfer took, and e) the calculated throughput (or speed) of the data transfer.

Read more:
An Overview of File Transfer Protocol
Scientific Applications of e-VLBI
UDT SourceForge page

Featured image from this site.

About Katherine Rosenfeld

Among other things, I'm a second year Astronomy graduate student studying the birth places of stars and planets.

4 Comments

  1. For “The UDP-based data transfer protocol (UDT) is an alternative to TCP/IP which reduces the latency (or delay in the file transfer) albeit with a reduction in reliability”: I think you misspoke. UDP is the alternative to TCP/IP, making the trade-off claimed.

    UDT is built on UDP, but has it’s own procedure for ensuring reliability at a higher level (their poster looks like it has a lot of the elements of TCP without the three-way handshake. In a sense, they are implementing a reliable connection on top of simple messages (UDP)). Otherwise, that table would probably show error rates too.

    Reply
  2. Hi Joseph,

    Yes – thank you for making that distinction. I was trying to cut corners to reduce the number of acronyms and ended up losing accuracy. I’m glad we have an error checking procedure in place!

    Best,
    Katherine

    Reply
  3. At the second figure caption you probably mean Metsähovi, Finland.

    Reply
    • Yes, absolutely. I’ve updated the text.

      Thanks,
      Katherine

      Reply

Leave a Reply