I recently decided to take a leave of absence from kaChing to work with the Covert Systems Biology Lab at Stanford on building the world’s first “whole-cell” computational model. The team (size: 3) is extremely talented and packed with domain knowledge and experience, but they’ve never worked on a piece of software this large before. When I accepted the position, I had only a vague memory of using MATLAB once for a school project. I knew I’d be inheriting 30k+ untested lines of code in an unfamiliar language, but it was just the kind of challenge I was hungry for — a chance to apply the agile, test-driven development mentality and practices I’d adopted at Google and kaChing to further science. Here I’ll describe what we’ve done and how it’s worked for us.
Automated Testing in MATLAB
The first item on my agenda was helping the team understand the value of automated testing. They’d written the code, and they each felt pretty good about most of their own code, but how could a newcomer like me trust it? And how could any of us work in anyone else’s code with any confidence that we weren’t breaking something?
I searched the web for a MATLAB testing framework (you’d think one would come bundled with an IDE+runtime whose installer is 2GB), but I only found a few amateurish testing libraries. Really? Has unit testing really not caught on in the MATLAB world? Perhaps very few MATLAB programmers are software engineers. Maybe most MATLAB code is short-lived — written for an assignment and then discarded. Jury’s still out. We eventually found and settled on a testing library by Steve Eddins patterned after the xUnit family.
What does a MATLAB unit test look like? Here’s our first one:
If you’re unfamiliar with MATLAB, it’s a math-oriented scripting language that just got OOP support about 3 years ago. It has plenty of quirks, but it’s also quite powerful.
A Continuous Build
Next on my agenda was setting up Hudson to run our first unit test. Why have a continuous build for an academic project that’s never deployed? A few reasons:
- It establishes the repo as the master copy of our source code (not what’s on any one of our workstations).
- It’s a clean, neutral environment for running our tests.
- The emails keep us all aware of any tests that are failing.
- The extra visibility encourages us to fix failures faster, improving the stability of our project.
Since our testing library isn’t particularly Hudson-friendly, I wrote a simple script that starts MATLAB in batch mode, runs our tests, and then greps the output to determine whether any tests failed. We use tee to capture the output for examination while also streaming it to stdout (so it still shows up in Hudson in real time).
echo Starting matlab…
matlab -nodisplay -r “fprintf(‘n’); try; $1; catch e; fprintf(getReport(e)); end; exit”
| sed ‘s/</?a[^>]*>//g’ | tee $fname
result=“$(tail -2 $fname)”
rm -f “$fname”
shopt -s nocasematch
if [[ “$result” =~ “passed in [0-9. a-z]*[.s]*$” ]]
On my to-do list is extending our testing library to generate an xUnit-style XML report for Hudson.
Small, Medium, and Large Tests
To the credit of my team, the simulation has a modular structure that lends itself well to testing at multiple levels. I used this slide from a recent post on the Google Testing Blog to illustrate the concept of test size to my team:
In the simulation, each cell process is modeled very differently (flux-balance analysis, ODEs, Poisson processes, etc.), yet they all expose a common interface to the controller. To isolate a cell process for unit testing, we:
- Generate a small test fixture containing parameters that are normally loaded from a knowledge base.
- In each test method:
- Put the cell model in an interesting state.
- Execute the cell process for one or more time steps.
- Make assertions about the resulting state.
Here’s one such test:
m = this.module;
m.enzymes(:) = 0;
m.dnaAComplexSize(1) = 2;
m.dnaABoundSites(m.R1234dnaABoxIndexs(:,1)) = m.siteDnaAATPBound;
m.kd1ATP = 0; %disable dnaA-ATP unbindingm.evolveState();
It verifies that when dnaA-ATP (a replication initiation factor) is bound at each of the R1-R4 sites in the OriC region of the Mycoplasma chromosome, we properly increment the dnaA complex size.
We’ve only just begun writing medium and large tests. Here are some examples of what they do:
- verify that related cell processes cooperate as expected
- verify that various properties of the cell follow expected trends over time
- verify the consistency of the knowledge base
When you have no users, no customers, and no business, how do you measure progress? (Hint: It’s in big letters in the Manifesto.) Working software. Our team’s medium-term goal is a computational model of Mycoplasma with enough predictive value to facilitate biological discovery. We decided up front to list the components of the model (about 30), define a test plan, and keep a color-coded shared spreadsheet to track the following milestones for each component:
- code done
- unit tests written
- unit tests passing
- code reviewed
- covered by medium test(s)
- test author(s)
It also has a list of deferred/unresolved issues off to the side. The spreadsheet, like a story/task board, helps us stay organized, aligned with our priorities, and cognizant of our progress. It also facilitates discussion during review sessions.
So, are we there yet? Not quite… but we’re close. In the meantime, it’s fulfilling to see Biology and Biophysics PhD candidates injecting dependencies, designing for testability, and committing new tests with fixes.
Anyone else using agile practices or TDD in an academic setting? We’d love to hear about it.