Archive for November, 2009
We’ve reached another significant milestone in the Pulse 2.1 beta: the release of 2.1.9. This latest build rolls up a stack of fixes, improvements and new features. Some of the much-anticipated improvements include:
- Support for NAnt in the form of a command and post-processor.
- Support for reading NUnit XML reports.
- Support for reading QTestlib XML reports.
- The ability to mark unstable tests as “expected” failures: they still look ugly (so fix them!) but won’t fail your build.
- Better visibility of what is currently building on an agent.
- New refactoring actions to “pull up” and “push down” configuration in the template hierarchy.
- The ability to specify Perforce client views directly in Pulse.
I’ll expand upon some of these in later posts. In addition we’ve made great progress on the new project dependencies support, which should be both easier to use and more reliable in this build.
We’d love you to download Pulse 2.1 and let us know what you think!
Lately one of our Pulse agents has been bogged down, to the extent that some of our heavier acceptance tests started to genuinely time out. Tests failing due to environmental factors can lead to homicidal mania, so I’ve been trying to diagnose what is going on before someone gets hurt!
The box in question runs Windows Vista, and I noticed while poking around that some disk operations were very slow. In fact, deleting even a handful of files via Explorer took so long that I gave up (we’re talking hours here). About this time I fired up the Reliability and Performance Manager that comes with Vista (Control Panel > Administrative Tools). I noticed that there was constant disk activity, and a lot of it centered around C:\$MFT — the NTFS Master File Table.
I had already pared back the background tasks on this machine: the Recycle Bin was disabled, Search Indexing was turned off and Defrag ran on a regular schedule. So why was my file system so dog slow? The answer came when I looked into the AppData\Local\Temp directory for the user running the Pulse agent. The directory was filled with tens of thousands of entries, many of which were directories that themselves contained many files.
The junk that had built up in this directory was quite astounding. Although some of it can be explained by tests that don’t clean up after themselves, I believe a lot of the junk came from tests that had to be killed forcefully without time to clean up. It was also evident that every second component we were using was part of the problem – Selenium, JFreechart, JNA, Ant and Ivy all joined the party.
So, how to resolve this? Of course any tests that don’t clean up after themselves should be fixed. But in reality this won’t always work — especially given the fact that Windows will not allow open files to be deleted. So the practical solution is to regularly clean out the temporary directory. In fact, it’s quite easy to set up a little Pulse project that will do just that, and let Pulse do the work of scheduling it via a cron trigger. With Pulse in control of the scheduling there is no risk the cleanup will overlap with another build.
A more general solution is to start with a guaranteed-clean environment in the first place. After all, acceptance tests have a habit of messing with a machine in other ways too. Re-imaging the machine after each build, or using a virtual machine that can be restored to a clean state, is a more reliable way to avoid the junk. Pulse is actually designed to allow reimaging/rebooting of agents to be done in a post-stage hook — the agent management code on the master allows for agents to go offline at this point, and not try to reuse them until their status can be confirmed by a later ping.
You are currently browsing the a little madness blog archives for November, 2009.