Reproducible results are a foundational element of scientific progress. As early as primary school, the importance of the research notebook is stressed in documenting theories and experiments. As astronomical datasets grow in size and complexity, accurate documentation for every version of an analysis becomes challenging. Critical aspects of an experiment can be forgotten in the minds of the original investigators, making it difficult or even impossible to reproduce or verify the result. With most data manipulation now occurring digitally, the handwritten research notebook alone may no longer be sufficient for curating reproducible science. The mass adoption of software tools like Python and TeX (via Astropy and online tools like Overleaf) within the astronomical research community provides a new opportunity to promote open science standards through these tools. We’re looking [beyond] the research notebook in today’s bite to discuss helpful documentation tools and good research habits.
The most powerful tool in the researcher’s software toolbox is version control. The most common example is Git, which provides command-line interface tools for handling software file versioning. A user creates a “repository,” which contains a directory of all files that will be managed as a single collection. Then, as users create and edit files, they can periodically “commit” their modifications to the repository with a short message about the changes made. That version of the repository can then be restored at any later time. Websites like GitHub host repositories to facilitate multi-user collaborations and often support many other features. Git is heavily documented online, with many beginner resources. Tutorials for interacting with remotely hosted repositories are also available.
Selecting tools for writing and composition with revision history is becoming increasingly important with the growing influence of AI. Several examples of AI-detection software failing blind tests have already made their way into the limelight, highlighting the need for students (and researchers alike) to be able to demonstrate the original composition of their published works. Many tools often already provide some version control built-in, making its addition to a workflow more seamless than using Git. Collaborative web-editing tools like Google Docs and Overleaf already provide basic file history, which can be used to demonstrate originality, as opposed to common offline editors like Word and Pages. For less stringent file version control, cloud-based file storage often supports basic file history and is frequently used as “hot” version control for near-instantaneous automatic updates (this mode is useful when there are only two or three collaborators). However, be aware that these typically have a maximum history length of one year (or less), so it is not a long-term solution.
Scientists are uniquely held to a high standard of documentation by the nature of the peer review process for publication. However, data or experimental details are often forgotten or hidden in a research notebook once a manuscript has been accepted for publication, making investigating, reproducing the results, or even retrieving the original data after the fact difficult. This leads to a significant fraction of published manuscripts that are no longer “standalone” and require author input to supplement the published material. To combat this, tools like showyourwork have been developed to help researchers create a workflow that takes a reader from any stage of the analysis process–from the raw observations to final data products–through creating figures and compiling the manuscript. Using Git, researchers can create a repository that can be published alongside the manuscript. Tools like these are becoming more popular as journals require more transparency (e.g., MNRAS’s Data Availability statement).
Improving transparency is simple. Digitizing a handwritten research notebook on Google Docs is a straightforward step toward open science. This bite touches on some available practices and tools, but much more is available online in many contexts, not just astronomy. Transparency and reproducible research can only benefit and expedite scientific progress.
Edited by Janette Suherli
Discover more from astrobites
Subscribe to get the latest posts sent to your email.