Sleuthing Git using Git, or, Regression testing done backwards
July 13th, 2011I’ve been using Git for a while now, and when I started at WorkSmart and found out that they were using Subversion, one of the first projects I volunteered for was migrating to a distributed version-control system. Distributed version-control systems (DVCSs) confer may benefits over centralized ones like Subversion. Git is probably most famous for its easy branching and smart merging. You can quickly and easily create a new branch to test out a new idea and make a few commits, then merge it back in if it’s successful. Compared to SVN, it’s also blazing fast, which is something we pay attention to at WSL (we use pretty powerful development machines, and we even hacked the Android build tools when we found they were running too slowly). Furthermore, since every developer’s machine is an entire repository with a full copy of the history, we don’t hurt as much if our central repository goes down (though, of course, it’s no replacement for off-site backups). Now, other distributed solutions, like Mercurial, confer some of these same benefits, but I’m not working in a void, and since I easily found Git replacements for our Rietveld tools, but none for Mercurial, I chose Git.
Anyway, it’s been my job for the past while to break in Git and sort out the teething problems before everybody else makes the switch. The git-svn tool has been extremely helpful so far in letting me sort out issues and get the workflow set up while everybody else continues working with our Subversion server. Unfortunately, I had been having problems with branches, one of the most important features Git has over SVN. Creating them wasn’t an issue, but checking them out was seriously hindering my workflow:
$ time git branch foo
git branch foo 0.01s user 0.00s system 77% cpu 0.016 total
$ time git checkout foo
Switched to branch 'foo'
git checkout foo 11.73s user 0.18s system 99% cpu 11.937 total
$ time git checkout master
Switched to branch 'master'
git checkout master 11.62s user 0.18s system 99% cpu 11.813 total
I usually have at least two feature branches on the go at a time, and then of course there are the typical interruptions – a coworker wants me to review some code with them, there’s a bug in the latest release which needs fixing right now (don’t forget your hat), or any of the other interruptions software developers might face on any given day. Pausing for twelve seconds or more just to create a hotfix branch or merge in a feature may seem like a small interruption, but it really breaks the flow. The problem resisted several attempts to solve it: one of my guinea pigs coworkers helping me break in Git wasn’t experiencing it, even though we both had the same repository checked out, so it probably wasn’t directly due to the fact that we have something like 90,000 files in our working tree. Somebody in #git suggested that I profile the checkout operation with the strace utility, which led me down a bit of a wild goose chase when I found out there were about 90,000 calls being made to lstat64. Ultimately, though, the problem was here:
$ git --version
git version 1.7.0.4
After upgrading to the latest version (1.7.5.4 at the time), the problem went away! Good enough for some, but I know about git bisect and I was really curious. Bisect is usually used to find out which commit introduced a bug (what’s usually called “regression testing”, but with some simple mental gymnastics, you can use it to find out which one fixed a bug instead. So let’s clone git.git and do some bisecting!
The key is that bisect performs a simple binary search: you tell it where something bad is happening (e.g., the build is broken, or your test suite is failing), and then you tell it a known good state (the build wasn’t broken; your tests passed), and Git will search through the commits, asking you at each step of the way whether the revision it chose was bad or good. There are a lot of fancy options; for example, if the revision Git chooses isn’t to your liking, you can skip it and choose a different one. If you’re hardcore, you can even write a script to check whether a revision is good or bad and tell Git to run it, and then the entire bisect operation will be automated.
Back to those gymnastics, though. Since git bisect was designed to find regressions, we need to flip the meanings of “bad” and “good” in order to use it to find a fix. Also keep in mind that we’re bisecting Git itself, here! I cloned the Git repository, and all of the bisect commands you see below are being run in that clone. Aside from that, this is a pretty straightforward bisection. Since the key operation is performing a checkout on a big repository, I used my checkout of our production repository, and a new test branch, and then every time I checked out and built a new revision of git.git, I would change into my “real” repository and do a checkout (using the freshly-built git, not the one on my path) to see how long it took.
$ git bisect start
$ git bisect bad # i.e., the fix has been introduced in the latest version.
$ git bisect good v1.7.0.4 # i.e., the bug still existed at this tag.
Bisecting: 2212 revisions left to test after this (roughly 11 steps)
[599b0bf438e387dbaf00dbadcbe41b2a0de90db1] msvc: opendir: fix malloc-failure
$ make
GIT_VERSION = 1.7.3.2.264.g599b0
[more output clipped]
$ cd /path/to/big/repo
$ time ~/git/git checkout -b testbranch
Switched to a new branch 'testbranch'
real 0m1.213s
user 0m0.948s
sys 0m0.256s
$ cd ~/git
$ git bisect bad
Bisecting: 1107 revisions left to test after this (roughly 10 steps)
[6296062285e07051a5a46cbbd74c67a10bf2ac16] Merge branch 'tr/rev-list-count'
[steps omitted]
$ time ~/git/git checkout testbranch
Switched to branch 'testbranch'
real 0m11.919s
user 0m11.693s
sys 0m0.192s
$ cd ~/git
$ git bisect good
Bisecting: 275 revisions left to test after this (roughly 8 steps)
[0925c02e212fc69dfd095ca1116e3d8999a95e44] Merge git://git.bogomips.org/git-svn
[and so on, until eventually...]
$ git bisect bad
e53e6b4433f264250c2e586167caf61721b0185c is the first bad commit
[commit's log message clipped]
Keep in mind that here, “good” means “not fixed yet”, and “bad” means “fixed”. I almost caught myself issuing the wrong git bisect command once or twice. Anyway, sure enough, the commit in question describes a fix for poor performance on large repositories involving an inefficient tree traversal algorithm which checked some index files over and over. It’s obvious that this might slow down some operations on a large repository. Mystery solved! Now I can get back to coding in peace. At least until the next time I have to do a hotfix for a bug in production…
Andrew Cameron, Software Engineering Intern


