Archive for March, 2008
As our new acceptance test suite for Pulse 2.0 has grown, naturally it has taken longer to execute. More problematicly, however, several tests towards the end of the suite were taking longer and longer to run. This clearly highlighted a scalability issue: as more data was piled into the testing installation, tests took longer to execute. The situation became so bad that a single save in the configuration UI could take tens of seconds to complete.
A simple way to solve this would be to blow away the data after each test (or at regular intervals) so that we don’t see the build up. This would improve the performance of the test suite, but would also hide a clear performance issue in the application. Better to keep on eating our own dog food and fix the underlying problem itself. This way not only would the tests execute more quickly, but (more importantly) the product would also be improved.
Basic profiling identified the major bootleneck was many repeated calls to Introspector.getBeanInfo. This call was known by us to be slow, but according the documentation the results should be cached internally. It turns out that the cache is ignored when a certain argument is used (quite common!), so we were triggering a large amount of reflection many times over. By adding a simple cache on top of this method, it practically disappeared from the profiles and the time taken to run our test suite was better than halved! More importantly, the latency in the UI was decreased by several times to an acceptable level.
Naturally optimisation is not always this easy. The more tuned your application, the harder it becomes to find such dramatic improvements. However, it pays to keep an eye on your test suite performance, and profile if things don’t feel right. You might just find a simple win-win situation like the one described.
The first thing I noticed about CUnit was the clean website and full documentation. This is a rarity amongst the C-based libraries I have seen, and very welcome to a first-time user. The structure supported by the library is a fairly standard suite-based grouping of individual test functions. Notably, CUnit comes with a few reporting interfaces out of the box – including the all important XML reports (typically the easiest way to integrate into a continuous build).
Setup on Windows
I built the library using Visual C++ 2008 Express. Although the provided VC files were from an earlier version, 2008 converted and built the solution without issues. The static library appeared at:
Note: I first tried my luck with an unsupported configuration by building the library under Cygwin using ftjam. This didn’t work, although I suspect porting the build would not be hard.
The First Test
Next I started with a contrived test program to see how it all fits together. The program has a single suite with setup and teardown functions, with one passing an one failing case.
gFoo = strdup("i can has cheezburger");
if(gFoo == NULL)
CU_ASSERT_STRING_EQUAL(gFoo, "i can has cheezburger");
CU_ASSERT_STRING_EQUAL(gFoo, "no soup for you!");
CU_pSuite pSuite = NULL;
if(CU_initialize_registry() != CUE_SUCCESS)
pSuite = CU_add_suite("First Suite", setup, teardown);
if(pSuite == NULL)
if(CU_add_test(pSuite, "Test Pass", testPass) == NULL)
if(CU_add_test(pSuite, "Test Fail", testFail) == NULL)
The code illustrates how to write test functions with assertions, as well as how to assemble a test suite and run it with the built-in basic UI. The output from this program is shown below:
CUnit - A Unit testing framework for C - Version 2.1-0 http://cunit.sourceforge.net/ Suite: First Suite Test: Test Pass ... passed Test: Test Fail ... FAILED 1. c:\projects\cunitplay\firsttest\main.cpp:30 - CU_ASSERT_STRING_EQUAL(gFoo,"no soup for you!") --Run Summary: Type Total Ran Passed Failed suites 1 1 n/a 0 tests 2 2 1 1 asserts 2 2 1 1
Generating XML Reports
Generating an XML report required a very small change. Rather than using the basic UI, I switched to the automated UI and set the output filename prefix:
/* Test functions the same as previous example. */
/* All setup code remains the same */
if(CU_add_test(pSuite, "Test Fail", testFail) == NULL)
Running this program produces two XML output files: CUnit-Listing.xml and CUnit-Results.xml. The former contains data about the test suites and cases themselves, the latter contains the results of running the tests. This latter report is useful for integration with other systems; e.g. it is the report that we now post-process to pull test results into Pulse.
Note that these XML files come with a DTD and stylesheet, which is useful, but also akward. As the DTD and XSL file are not present alongside the reports where they are generated, opening the files will fail.
Starting from this vey basic test setup leaves plenty of room for improvement. For example, if you look at the sample code above, you will notice that a lot of time is spent assembling the suite and checking for errors. Thankfully the CUnit authors have noticed the same and offer simplifications:
- There are shortcuts for managing tests. For a non-trivial suite this method of describing the test suites in nested arrays and registering in one hit would be a lot simpler.
- You can optionally tell CUnit to abort on error. Since any error at the CUnit level is likely to be fatal, this should reduce the need for error handling.
Another improvement that could perhaps be added in the future is better reporting for assertion failures. Looking back at the assertion failure output:
Test: Test Fail ... FAILED 1. c:\projects\cunitplay\firsttest\main.cpp:30 - CU_ASSERT_STRING_EQUAL(gFoo,"no soup for you!")
you can see that it pulls out the assertion expression as-is. However, given that I am using a specific string equals assertion, CUnit should also be able to show me the actual values that were compared at runtime, and even the difference. This can avoid the need to run the test again under a debugger to see the values.
Overall I would rate CUnit as a decent choice for CUnit testing. In terms of features all the basics are covered, and important improvements have been made. There is not much new, however, and there is still room for improvement. Where this library really shines is in the out of the box experience; thanks to excellent documentation and multiple included UIs.
If you’re looking for a plain C unit testing solution this is the best open source alternative I have seen. While you’re at it, why not go for a full continuous integeration setup with the CUnit support now in Pulse ;).
Checked exceptions is one of those Java features that tends to work up a lively discussion. So forgive me for rehashing an old debate, but I am dismayed to see continued rejection of them as a failed feature, and even more so to see proposals to remove them from the language (not that I ever think this will happen).
In the Beginning…
Most would agree that mistakes were made in the way checked exceptions were used in the early Java years. Some libraries even go so far as doing something about it. However, an argument against the feature itself this does not make. Like anything new, people just needed time in the trenches to learn how to get the most out of checked exceptions. A key point here is that checked exceptions are available but not compulsory. This leads to a reasonable conclusion, as made in Effective Java:
Use checked exceptions for recoverable conditions and runtime exceptions for programming errors
The key point here is recoverable. If you can and should handle the exception, and usually close to the source, then a checked exception is exactly what you need. This will encourage proper handling of an error case – and make it impossible to not realise the case exists. For unrecoverable problems, handling is unlikely to be close to the source, and handling (if any) is likely to be more generic. In this case an unchecked exception is used, to avoid maintenance problems and the leakage of implementation details.
None of the above is new, and to me it is not contraversial. But the continued attacks on checked exceptions indicate that some still disagree. Let’s take a look at some common arguments against checked exceptions that still get airtime:
Some argue that the proliferation of code that either swallows checked exceptions (catches them without handling them) or just rethrows them all (“throws Exception”) is evidence that the feature is flawed. To me this argument carries no weight. If we look at why exception handling is abused in this way, it boils down to two possibilities:
- The programmer is too lazy to consider error handling properly, so just swallows or rethrows everything. A bad programmer doesn’t imply a bad feature.
- The programmer is frustrated by a poorly-designed API that throws checked exceptions when it shouldn’t. In this case the feature is an innocent tool in the hands of a bad API designer. Granted it took some time to discover when checked exceptions were appropriate, so many older APIs got it wrong. However, this is no reason to throw out the feature now.
You need well-designed APIs and good programmers to get quality software, whether you are using checked exceptions or not.
A common complaint is how changes to the exception specification for a method bubbles outwards, creating a maintenance nightmare. There are two responses to this:
- The exceptions thrown by your method are part of the public API. Just because this is inconvenient (and when is error handling ever convenient?) doesn’t make it false! If you want to change an API, then there will be maintenance issues – full stop.
- If checked exceptions are kept to cases that are recoverable, and usually close to the point when they are raised, the specification will not bubble far.
Switching every last exception to unchecked won’t make maintenance problems go away, it will just make it easier to ignore them – to your own detriment.
Some argue that all the try/catch blocks that checked exceptions force on us clutter up the code. Is the alternative, however, to elide the error handling? If you need to handle an error, the code needs to go somewhere. In an exception-based language, that is a try-catch block, whether your exception is checked or not. At least the exception mechanism lets you take the error handling code out of line from your actual functionality. You could argue that the Java’s exception handling syntax and abstraction capabilities make error handling more repetitive than it should be – and I would agree with you. This is orthogonal to the checked vs unchecked issue, though.
Testing Is Better
Some would argue that instead of the complier enforcing exception handling in a rigid way, it would be better to rely on tests catching problems at runtime. This is analogous to the debate re: static vs dynamic typing, and is admittedly not a clear cut issue. My response to this is that earlier and more reliable feedback (i.e. from the compiler) is clearly beneficial. Measuring the relative cost of maintaining exception specifications versus maintaining tests is more difficult. In cases where an exception almost certainly needs to be recovered from (i.e. the only case where a checked exception should be used), however, I would argue that the testing would be at least as expensive, and less reliable.
I am yet to hear an argument against checked exceptions that I find convincing. To me, they are a valuable language feature that, when used correctly, makes programs more robust. As such, it is nonsense to suggest throwing them out. Instead, the focus should be on encouraging their appropriate use.
Ben Rady’s recent blog post Continuous Integration Is A Hack always had a title that would attract the attention of someone working on this said “hack”. One core part of Ben’s argument is that:
Agility is a set of values and principles…and no more. Practices depend on technology and technique, which can (and should) be constantly evolving. Values, on the other hand, address the fundamental issues raised when humans work together to create something, and that is truly what makes Agility worthwhile.
This I completely agree with, and it also makes me sceptical of many “certifications”. However, Ben’s attack on CI (while making for a good headline) is off the mark in many ways. Let’s take a look at what he actually says:
CI is one of many useful practices that Agile developers employ today, but fundamentally, it is a hack. That’s because although it’s very useful to know that I’ve introduced a bug 20 minutes after creating it, what I really want is to know the very second that I type the offending line in my editor. CI is just the best that we can do right now, given the technology that is at our disposal.
This shows Ben’s leaning towards Continuous Testing. However, the criticism is wrong in several ways:
- CI is not just (or even primarily) about automated testing, it is about integration. Running tests as you edit your code does not take integration with your team members into account!
- Running tests continuously on your local machine doesn’t help detect cross-platform or machine-specific issues – you need builds running in an independent environment.
- Some tests will never run fast enough to be run continuously, but they still need to be run as frequently as possible.
I could go on, but there is a more fundamental problem here: Ben is boxing the definition of CI into the technical boundaries that he sees today. CI is a practice, not an implementation. I find it ironic that Ben is railing against the definition of Agile as a static set of practices, but then defines CI by today’s implementations! As time passes tools will improve and will take CI to new levels. A key area of improvement has always been reducing feedback time: so rather than being at odds with CI, continuous testing is one part of the larger practice!
Speed up your tests by writing more tests? No, I haven’t gone insane, there is a clear distinction between the two types of tests:
- Acceptance tests verify the functionality of your software at the users level. In our case these tests run against a real Pulse package which we automatically unpack, setup and exercise via the web and XML-RPC interfaces.
- Unit tests verify functionality of smaller “units” of code. The definition of a unit varies, but for our purposes a single unit test exercises a small handful of methods (at most).
From a practical standpoint, acceptance tests are much slower than unit tests. This is due to their high-level nature: each test will typically exercise a lot more code. Add to this the considerable overhead of driving the application via a real interface (e.g. using Selenium), and each acceptance test can take several seconds to execute. All this time stacks up and can considerably slow a full build and test.
The huge difference between the time taken to execute a typical unit and acceptance test leads to a simple goal for speeding up your build as a whole: prefer unit tests where possible. The important thing is to do so without compromising the quality of your test suite. Remember: at the end of the day what happens at the users level is all that matters, so you can’t compromise your acceptance testing. However, there are many examples where you can push tests down from the acceptance to the unit level.
A recent example in Pulse is testing of cloning functionality. In Pulse 2.0, the new configuration UI allows cloning of any named element. Under the hood the clone is quite powerful: it needs to deal with configuration inheritance, references both inside and outside the element being cloned, cloning multiple items at once and so on. However, none of this needs to be tested at the acceptance level. Instead, the acceptance tests focus on testing the UI (forms, validation and so on), and an example of each basic type of clone. Underneath, the unit tests are used to exercise the various combinations to ensure that the back end works as expected. As the line between the front and back ends is clear, it is relatively easy to decide which tests belong where.
So, next time you’re creating some acceptance tests for your software, remember to ask yourself if you can push some tests down to a lower level. Do all the combinations need to be tested at a high level, or are they all equivalent from the UI perspective? A little forethought can shave considerable time off your build.
Our dog food Pulse server, which spends all day building itself, is a headless box. This presented no challenge for the Pulse 1.x series of releases, as our builds are all scripted and don’t require any GUI tools. With Pulse 2.0, however, things changed. As mentioned in my previous post, this new release has a whole new acceptance test suite based on Selenium. The problem is: Selenium runs in a real browser, and that browser requires a display.
Fortunately, the dog food server is also a Linux box. Thus there is no need to add in a full X setup with requisite hardware just to have Selenium running. This is thanks to the magic of Xvfb – the X virtual framebuffer. Basically, this is a stripped back X server that maintains a virtual display in memory. Hence no actual video hardware or driver is needed, and we can keep things simple.
Setting things up is straightforward. First, install Xvfb:
on Debian/Ubuntu; or
on Fedora/RedHat. Then, choose a display number that is unlikely to ever clash (even if you add a real display later) – something high like 99 should do. Run Xvfb on this display, with access control off1:
Now you need to ensure that your display is set to 99 before running the Selenium server (which itself launches the browser). The easiest way to do this is to export DISPLAY=:99 into the environment for Selenium. First, make sure things are working from the command line like so:
Firefox should launch without error, and stay running (until you kill it with Control-C or similar). You won’t see anything of course! If things go well, then you need to modify the way you launch the Selenium server to ensure the DISPLAY is set. There are many ways to do this, in our Pulse setup we actually use a resource as it is a convenient way to modify the build environment.
An alternative which may suit some setups is to use the xvfb-run wrapper to launch your Selenium server. I opted against this as Pulse has resources to modify the environment and this way I do not need to change the actual build scripts at all. However if you are not using Pulse (shame! 🙂 ) then you may want to look into it.
1 If you are worried about access on your network, then using -ac long-term is not a good idea. Once you have things working, I would suggest tightening things up by turning access control on.
You are currently browsing the a little madness blog archives for March, 2008.