This is the second of two posts on setting up a research project; find part 1 here.
The previous post in this guide covered some good housekeeping for research projects that I’ve learned (with difficulty!) over the course of my PhD: a clear directory structure, managing citations, and keeping a log file. In this post, I’m going to focus on ways to keep your code tidy. This depends on which programming language you will use, and this next advice is mostly Python-focused. However, many of these strategies may still be useful, whether you’re coding in IDL, C, Fortran or any other language!
Organise your directories (again)
A strategy to keep code more organised is to keep your source code, i.e. the detailed, nuts-and-bolts functions and packages that you’ve written and use all the time, in one place, and then do your analysis somewhere else. Tasks like reading in data and making plots can be kept elsewhere. Jupyter notebooks are a great tool for this analysis part if you use Python, allowing you to see the output of your code as you go and present it neatly to colleagues or your supervisor. You can import functions from your source code when necessary. By separating the source code from the analysis, any significant problems with your code can be identified much quicker than if they are tangled up with superficial mistakes in making plots, which makes debugging much easier.
Find a great text editor
You can then edit your source code using a text editor. This is a program that, at its most basic, allows you to write and edit unformatted text, though the most comprehensive text editors can include features like colour schemes highlighting the syntax and structure of your code, nifty debugging tools, and integrated version control (more on that later!). There are so many options here, and it comes down to personal preference, but editors such as Sublime Text, PyCharm and VSCode come with all the bells-and-whistles which will make writing your code easier.
Comments are your friends. Especially if, after a year of working on a different project, you need to revisit the code you wrote in the first weeks of your PhD. Add a comment to explain any number that you use — whether it’s a physical constant, a number of times to iterate over a process, or a correction factor, there’s a good chance that if you re-use this code in the future or pass it on to someone else, these numbers won’t be clear even if they seem obvious now. Many students inherit code from supervisors, and the process of understanding how it works is infinitely less challenging if it is peppered with explanations. Learn to write useful comments, and you will thank yourself later.
Keeping track of versions of your code is extremely important, especially when code goes from working to broken, or needs to be modified to do something slightly different to the original purpose. A version control system allows you to revert back to the code that worked, or test alternative methods alongside the working version. Git is probably the most used version control system in astronomy, and GitHub is a platform that hosts your code. It’s free for students, and also comes in handy for publishing and sharing code. In my department, we use GitHub to share examples and host a code review, as it allows us all to access and edit our own versions of the same code, without overwriting each others’ changes. This one was a big learning curve for me, but looking back at the horrendously complex lists of files in the disaster-zone that was my previous “version control” solution, I can tell this is going to be really helpful.
One of the reasons that Python is so widely used is that its users have written a multitude of useful packages — additional tools that you can install for specific purposes. Numpy and Scipy are examples of such packages that are indispensable if you’re doing anything involving maths and statistics, and Astropy is one which can do almost any task involving astronomical data. However, packages are updated as they are developed, and so code that is written using one version of a package can cease to function if it is run using a different version. If you want to share your code, or run it on different computers, a package management system like Anaconda can create an “environment” of all the correct versions so that everything functions as it should. Here’s a useful guide to getting set up.
The tools that you choose to use will depend very much on your project, but these are some of the strategies that I use to help keep track of my code, and organise my work better.
Thanks to all the advice from PhD students at the University of Hertfordshire in putting together this article!