Are Temp Files Slowing Your Builds Down?
Lately one of our Pulse agents has been bogged down, to the extent that some of our heavier acceptance tests started to genuinely time out. Tests failing due to environmental factors can lead to homicidal mania, so I’ve been trying to diagnose what is going on before someone gets hurt!
The box in question runs Windows Vista, and I noticed while poking around that some disk operations were very slow. In fact, deleting even a handful of files via Explorer took so long that I gave up (we’re talking hours here). About this time I fired up the Reliability and Performance Manager that comes with Vista (Control Panel > Administrative Tools). I noticed that there was constant disk activity, and a lot of it centered around C:\$MFT — the NTFS Master File Table.
I had already pared back the background tasks on this machine: the Recycle Bin was disabled, Search Indexing was turned off and Defrag ran on a regular schedule. So why was my file system so dog slow? The answer came when I looked into the AppData\Local\Temp directory for the user running the Pulse agent. The directory was filled with tens of thousands of entries, many of which were directories that themselves contained many files.
The junk that had built up in this directory was quite astounding. Although some of it can be explained by tests that don’t clean up after themselves, I believe a lot of the junk came from tests that had to be killed forcefully without time to clean up. It was also evident that every second component we were using was part of the problem – Selenium, JFreechart, JNA, Ant and Ivy all joined the party.
So, how to resolve this? Of course any tests that don’t clean up after themselves should be fixed. But in reality this won’t always work — especially given the fact that Windows will not allow open files to be deleted. So the practical solution is to regularly clean out the temporary directory. In fact, it’s quite easy to set up a little Pulse project that will do just that, and let Pulse do the work of scheduling it via a cron trigger. With Pulse in control of the scheduling there is no risk the cleanup will overlap with another build.
A more general solution is to start with a guaranteed-clean environment in the first place. After all, acceptance tests have a habit of messing with a machine in other ways too. Re-imaging the machine after each build, or using a virtual machine that can be restored to a clean state, is a more reliable way to avoid the junk. Pulse is actually designed to allow reimaging/rebooting of agents to be done in a post-stage hook — the agent management code on the master allows for agents to go offline at this point, and not try to reuse them until their status can be confirmed by a later ping.
This entry was posted on Tuesday, November 24th, 2009 at 10:01 am and is filed under Agile, Continuous Integration, Java, Project Automation, Technology. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

November 24th, 2009 at 10:22 am
At the beginning of every build I create a “_tmp” directory under the Pulse base dir and set TMP, TMPDIR, TEMP and TEMPDIR to point to that. With “clean checkout” or “clean update”, all the temp files will be deleted at the end of a build… that fixes almost all the problems I’ve seen so far with temporary files not being cleaned up.
The remaining problems are tools that hardcode temp dir paths, and tools that are so broken they can fill up an entire temp dir within the time of a single build!
November 24th, 2009 at 11:16 pm
Hi Rohan,
That’s a neat idea: piggy-back on Pulse’s default cleanup to blow away the temp files when the build completes. I might give that a go — then we could also add warnings to our builds if stray junk is found in those directories.