Tools for reproducible science

In some of our daily summary bites, you may see “open access” written in parenthesis next to the journal name. In some other bites, it says “closed access” there. That’s because some journals, like Nature or Science, do not have unpaid access to their articles. They require users to pay if they want to read the latest science news. Gladly, we’re on our way to overcome the issues with closed access because of arxiv.org, a website on which you can find articles for free, however only if the authors made their articles available there. Fortunately, most astronomers actually do that and maximize the accessibility to their works. 

Science can only be improved if it is open and if the results of papers are transparent. In addition to the easily accessible written articles, the ideal open-science world would also allow you to easily reproduce the results of any paper. This could be done more easily if the authors of the papers included the codes they used for analysis or made plots on an accessible platform. Some scientists refuse to do that out of principle, since this often requires a lot of extra work – something most scientists have enough of. In the 2016’s article from Nature Scientific Reports, 1,500 scientists were asked about the reproducibility of research results. A third of the respondents have never even thought about creating techniques for checking data for reproducibility, and only 40% indicated that they regularly use such techniques. Also, up to 70% of researchers have encountered non-reproducible experiments and results obtained not only by other groups of scientists, but also by the authors/co-authors of the published scientific papers! The contemporary focus on quantity rather than quality encourages the production of articles with high-profile headlines and no less inflated research results, which might lead to further distortions.

Let’s break down some key points of the importance of reproducible and open science:

  • Leads to collaborations and therefore improvement: science is done in collaborations! That’s why we have conferences – we want to hear what others are working on, exchange thoughts, and finally to collaborate with relevant people to publish more exciting results. If people can reproduce, build on and maintain your work, the collaborations become more effective!
  • Builds trust in the community: in astronomy, we build different models and tools to analyze data and build theories. By now, it is hard to keep track of all the tools out there, which is exciting! But, it may be difficult to know which tool to choose – transparency in the field creates trust between colleagues!
  • Creates interesting debates/discussions: other astronomers might come up with different results with different models or tools – this is where interpreting physics becomes very important. Controversial results are actually exciting, too! We definitely can have way more confidence in a result if two groups come to the same result using different methods. If the results are different though, there is more work to be done. (e.g. one group might have made a mistake, or the methods are biased, or there may be more than one answer).
  • Helps you work more efficiently: recently, I have come across a blog post on the Professor Lorena Barba’s group’s website. They share an anecdote about how it took them so much less time to reproduce their own results for a new paper having their results reproducible in the first place. 

In this bite, I want to talk about some awesome tools that can help you start making your own science reproducible!

GitHub – a very popular website for collaborative coding. There, you can create both public and private repositories (e.g. usually, people make the repositories with finished codes public). The other powerful tool of the website is its version control system – it allows you to see the changes in your code and a seamless collaboration without touching the original code. One example of a GitHub profile is this profile of a Professor Michael Zingale who advocates a lot for open science (permission was given). 

Zenodo – another popular tool for storing your papers, data files, research software, and other research-related artifacts. It is free to upload and free to access, and this universal repository makes your work citable and shareable. 

showyourwork! – a workflow created by Dr. Rodrigo Luger. It uses another awesome workflow for reproducible data analysis, Snakemake. The philosophy behind the workflow is that “anyone should be able to re-generate the article PDF from scratch at the click of a button.” showyourwork! is integrated with GitHub, Zenodo, and Overleaf, and it can save you a ton of time answering questions about how you got your results, because you can just share the GitHub repository with your paper that the workflow creates for you!

Reproducible workflow on a public cloud for computational fluid dynamics – a workflow created by Professor Lorena Barba’s group. It can store your computational studies in a public cloud called “Microsoft Azure”. The main benefit of the workflow is its speed: “public cloud resources today are able to deliver similar performances to a university- managed cluster, and thus can be regarded as a suitable solution for research computing.” (quoted from the paper)

Professor Lorena Barba’s research group cares a lot about reproducibility and writes about it on their blog. I recommend checking it out! 

Making science reproducible might seem like a lot of work – and this is true, but only in the beginning! In the long term, it actually saves much time as mentioned above. Thankfully, the community of people who care about open software is happy to help. For example, Flatiron Institute held an Astronomical Software Development Workshop, where people were sharing their thoughts on open science and how to keep building and maintaining the community. There are also more workshops coming in the future (stay tuned!). 

Finally, if you take a look at the figures below from the abovementioned Nature paper, luckily, you will see that, based on how the science community is developing, most factors that contribute irreproducible research can easily be eradicated (e.g. code/paper availability)! 

Remark from the author: I wasn’t involved in developing any of the workflows mentioned above. People mentioned in the bite are one of the gems in the open science community, and are mentioned solely for their work on reproducible science.

Astrobite edited by Jana Steuer

Featured image credit: Stanford Medicine

About Sabina Sagynbayeva

I'm a graduate student at Stony Brook University and my main research area is planet formation. I'm currently working on planetary migration using hydrodynamical simulations. I'm also interested in protoplanetary disks but nearly any topic related to planets is fascinating to me! In addition to doing research, I'm also a singer-songwriter. I LOVE writing songs, and you can find them on any streaming platform.

Leave a Reply