I developed these notes and exercises as part of a tutorial on how to use the Kao Group’s computing cluster. Although some of the details are specific to this specific cluster, much of the material could be useful for anyone getting started in computational physics, so I thought I would share it here. The materials are posted on github.com/adazi/bootCampEx and the best place to start is by reading README.md
I was flipping through the fourth edition Landau and Binder’s excellent book on Monte Carlo for statistical physics and I came across this gem on p. 139:
We end this chapter by summarizing a few procedures which in our experience can be useful for reducing errors and making simulations studies more effective. These thoughts are quite general and widely applicable. While these ‘rules’ provide no ‘money-back’ guarantee that the results will be correct, they do provide a prudent guideline of steps to follow.
(1) In the very beginning, think.
What problem do you really want to solve and what method and strategy is best suited to the study. You may not always choose the best approach to begin with, but a little thought may reduce the number of false starts.
(2) In the beginning think small.
Work with small lattices and short runs. This is useful for obtaining rapid turnaround of results and for checking the correctness of a program. This also allows us to search rather rapidly through a wide range of parameter space to determine ranges with physically interesting behavior.
(3) Test the random number generator.
Find some limiting cases where accurate, or exact values of certain properties can be calculated, and compare your results of your algorithm with different random number sequences and/or different random number generators.
(4) Look at systematic variations with system size and run length.
Use a wide range of sizes and run lengths and then use scaling forms to analyze data.
(5) Calculate error bars.
Search for and estimate both statistical and systematic errors. This enables both you and other researchers to evaluate the correctness of the conclusions which are drawn from the data.
(6) Make a few very long runs.
Do this to ensure that there is not some hidden time scale which is much longer than anticipated.
Documenting your code might seem like a time-wasting distraction. It’s very easy to convince yourself in the moment “I’ll totally remember how this works” or “it should be obvious what this line does.” I’ve done it myself, and then been totally confused by my own code when I go back to it a few months later. Comments within the code itself are an important form of documentation, but this post will discuss a different form: the README.
What is a README?
You’ve definitely already seen a README: they’re the text files that introduce a project and briefly explain how it works. Whenever you go to a git repo on Github or Gitlab (like uni10 or Firefox) the README serves as the landing page, but you’ll also see README files almost any time you download or install software. The README is the first stop for anyone that will use your code. If you design it well, it might be the last stop too:
Your documentation is complete when someone can use your module without ever having to look at its code. … Remember: the documentation, not the code, defines what a module does.–Ken Williams (source)
Why do I need to write a README?
Code is only as useful as its documentation. As scientists, we have an extra responsibility to document our code because our code is basically like part of an experiment. It is crucial that your code is an accurate record of what you did, even if you think you are the only person that will ever use your program. In our group the primary problem is ensuring that your code is usable by someone else after you leave. Perhaps another student will need it, or we might need to recheck some calculation.
Of course, often other people will use your code, and a well-written README can help them understand how it works, what its limitations are, and how to use it. The selfish reason to write one is for yourself in the future. Perhaps you know there is some issue with your program, like the parameter J cannot be set to a number greater than 2. That might be fine right now, but will you remember that when you have to rerun your code a year from now? Another good example are dependencies, especially if you had to install anything special. If you write those down right away it will be easy to get your code running on a different machine or after an operating system upgrade.
How to I write a README?
There is no single best answer here. The most important thing is that you do write one. You might think that its obvious how to compile and run your code, but someone new might be totally lost. Little things like compiler flags can really trip a new user up.
What should it include?
In the following subsections I provide some examples of sections you may want to include in your README file.
There is no shortage of README templates out there. These are a good place to start, but computational physics codes have unique needs, so I posted a template on Gitlab you can use as a starting point if you like.
At the very top you should have a brief introduction that includes:
- Program name
- Name of author(s)
- Contact information (email, website)
- A brief description of what your program does
- What technique it uses
- The model it studies
- What it’s for (why would someone want to use this program?)
- Copyright notice. I’m not an expert on licenses, so I don’t have much advice here. To keep it simple you can just say “Copyright YOUR NAME YEAR”
- Links to more extensive documentation elsewhere (papers, examples, etc).
Instructions to quickly get started with your program, e.g. how to compile and install with default settings and how to run it (it’s useful to include an easy way to test if they compiled and installed it correctly.
What does your program assume about the system? You don’t necessarily need to test rigorously, but you can at least say what system you developed your code on and what compiler you used. Write these down as you write your code so you don’t have to remember them lately. Other examples of dependencies:
- Libraries (especially anything you had to install, like MPI)
- Anything platform-dependent
- A specific compiler required
- Other external programs
List your source code files and any auxiliary files, where they are located and what they are for.
What inputs does your program require? Do you specify the parameters of the simulation as command line arguments or are they in a file? What is the format of the file? Which parameters are optional? What are the default values?
If it runs properly your program probably produces data. Is that data displayed on screen? Written to disk? Returned by a function? What format is the data in? How is it normalized?
This section definitely distinguishes computational physics code from ordinary programming. You don’t need to provide a full biography, but some links to papers that describe the method you’re using in detail, or papers you wrote using this program would be great.
Is there anything about your code that doesn’t work? Maybe J cannot be set to zero, maybe there’s a small memory leak, or a segfault that occurs under specific conditions. Depending on the bug, you might not really need to fix it, but it’s definitely important to tell the user so they aren’t caught by surprise (in some cases the user may be you in the future).
Formatting with Markdown
You will probably want to write your README in Markdown. Markdown is a simple way to format a text document (as opposed to markUP, get it? 🤔). The idea of Markdown is to allow minimal formatting while while remaining readable as plain text. You’re probably already familiar with Markdown since it is commonly used for simple text formatting on platforms like Slack. Both Github and Gitlab support Markdown. If you use a mac you can even get “Quick Look” (using the spacebar to view a preview of a file) to display correctly formatted markdown text using my guide here.
I’m not going to do a Markdown tutorial here (instead see the links in the resource section), but as a very quick introduction:
Enclose *italic* text within asterisks and **bold** text within double asterisks. You can include inline code using single backticks `like this` or sections of code with triple backticks like so:
cout << x << endl;
A Markdown-formatted file will typically end in ‘.md’. A quick Google search will return countless examples of Markdown editors for any platform.
- My README template
- I drew heavily on this guide to making README files
- Art of README by noffle, with a list of good examples, etc.
- Awesome README (a bunch of examples of good README files)
- 10 minute Markdown tutorial
- Markdown quick-reference guide
- Github Markdown Guide
- Using Markdown with macOS Quick Look
The Xcode visual debugger is really nice and easy to use. It turns out it’s possible to get it to play well with MPI code (although it may not work for actually multithreaded processes). This seems to be pretty buggy, but it does work. I’m not sure how useful this guide is to others, but honestly I’m writing it to remember how to do this in the future.
mpic++ is just a wrapper for the system C++ compiler, but on my system is defaults to g++ and we want to switch to c++ (the Apple C compiler). To start, we use the
--show-me:compile flag with our usual compilation command
mpic++ --show-me:compile *.cpp
This doesn’t actually compile the code, it just tells you the command that would have actually been executed with the C compiler. In my case the output is:
g++ config.cpp main.cpp mtrand.cpp -I/usr/local/include -L/usr/local/lib -lmpi
We’re going to take all those flags and put them after the Apple C compiler like so:
c++ -g config.cpp main.cpp mtrand.cpp -I/usr/local/include -L/usr/local/lib -lmpi
Don’t forget to add the debugging flag
-g. This command produces a binary file
a.out. Before we can run it, we need to get Xcode ready.
Launching the Debugger
Open the Xcode project. On the menubar, select Debug–>Attach to Process by PID or Name…
In the resulting window, enter the name of the executable (this this case
a.out) and click “Attach”.
Now we’re ready to actually run our simulation using the normal command in the terminal:
mpirun -np 1 ./a.out
When you switch back to Xcode, you should be able to use the debugging interface like normal. If it doesn’t come up automatically you may have to click on the debugging logo in the navigation bar on the left side of the window (circled below).
Trouble attaching to process?
For some reason I find that Xcode cannot attach to the process using this procedure often with an error like “Message from debugger: error 1”. I haven’t been able to figure out exactly why this happens or how to avoid it altogether. My use case is way outside what Xcode is designed for (i.e. building iOS apps). Overall I found that using the stop button within Xcode to kill the process (rather than killing it from the terminal) seems to make this problem less likely, but it still happens.
Once this problem occurs, the only solution I have found is to recompile the binary, naming it something else using the
-o othername.out flag (renaming an existing binary won’t work), attaching the debugger to the new binary and then running the new binary.