Archive for the ‘Testing’ Category
CITCON Amsterdam 2008 Retrospective
Over the weekend I attended the latest CITCON in Amsterdam. Like the last CITCON I made it to (in Sydney), this was a great event. There is no doubt that the conference formula of:
- free
- focused
- open spaces
- on the weekend
works well. I attended a bunch of interesting sessions, my favourite probably being Ivan Moore’s on “Flickering Builds”. Our own Selenium-based acceptance tests have brought a world of flickering pain, which we only eliminated with a great deal of effort. Even though we may know how to test an asynchronous UI, mistakes are still made and they are always more difficult to diagnose when they are timing-dependent. The important message from this session was that unexplained failed builds cannot be tolerated: every failure should be investigated, and underlying bugs in the software, tests or environment resolved.
I also gave a short demo of Pulse 2.0 in the “CI Server Smackdown” session. Due to a limited amount of time and a high amount of overlap among the feature sets of all tools, I opted to focus primarily on one of our new killer features: templated configuration. In Pulse 2.0, there is no need to duplicate configuration among similar projects, or for different builds of the same project. Instead, you define a template with all the shared configuration, and inherit concrete projects from that. After all, you don’t tolerate duplication in your codebase — so why tolerate it in your CI server configuration? The demo seemed to go down well from the feedback I got afterwards. In fact I was really humbled by the feedback from people who had tried Pulse at the conference — it was all really positive!
Thanks go out to PJ, JtF and the CITCON volunteers for putting on another great event. Looking forward to CITCON Europe next year…
Easier Stubbing With Mockito
Ever need to stub just a method or two of some interface to nicely isolate a unit to test? In the past I have used classic mock object libraries to help, but these suffered from a few problems:
- The use of strings to specify method names to stub, which breaks refactoring.
- Awkward syntax to specify parameter values to match.
- Compulsory verification of the order and number of calls to the stub.
Some time ago EasyMock came along and solved the first two problems. By using a new record-replay interface for defining expected calls, EasyMock tests were both more intuitive and more maintainable. However the third problem remained: sometimes I just want to stub the methods, and not verify that they are called in any specific order or number of times. Some contraints can be relaxed with EasyMock but it always seemed a pain to do extra work when I just wanted the library to do less!
Well, last week I discovered Mockito, a more recent library inspired in part by EasyMock. The difference with Mockito is that stubbing and verification are separated. Instead of recording expectations, you:
- Stub the methods of interest.
- Run your test.
- Selectively verify the calls to your stub as you choose.
This separation works brilliantly when you don’t want to verify everything by default. When the test is unconcerned with exactly how the stub is called, there is no need to verify at all.
You can read more about how Mockito came about on the author’s blog. I highly recommend checking it out!
Selenium on Ubuntu Hardy Heron
Ubuntu 8.04, the Hardy Heron, was released last week. I eagerly upgraded to see if my previous wireless networking issues had been resolved, and joy of joys they had! This allowed me to finally switch back to Ubuntu as my primary development environment.
Most of the switch went without a hitch, however I did encounter one problem. The acceptance tests for Pulse rely heavily on Selenium. When trying to launch Selenium on Hardy, Selenium RC always got stuck at:
Preparing Firefox profile…
The default version of Firefox installed with Hardy is Firefox 3 beta 5 – as opposed to Firefox 2 in previous releases. So it seems that Selenium and Firefox 3 don’t agree with each other – or at least not when using the current release of Selenium (1.0-b1). My best guess is that Selenium is trying to install extensions into Firefox but is failing as the compatible versions for the extensions don’t include Firefox 3. A better diagnostic would be nice!
In any case, the easiest resolution was to install Firefox 2, which can sit alongside Firefox 3 on Hardy. Just install the firefox-2 package:
Then make sure Selenium uses this version. One way to do this is to create a link named firefox-bin in your PATH that points to the Firefox 2 binary. However, I prefer to leave my system clean and make an exception just for Selenium. To allow this, our tests support an optional SELENIUM_BROWSER environment variable that specifies in Selenium format the browser to use (this comes in handy in other cases too). So before running tests, I ensure that SELENIUM_BROWSER is set:
To make this my default with no extra effort, I have actually just added it to my .bashrc.
On Fast Feedback
A major focus of agile development is fast feedback – of many kinds. As this idea is taken to its extreme, it is worthwhile asking if feedback can be too fast. That is, is there a point at which shrinking the feedback loop becomes counterproductive?
The benefit of feedback is knowing if you are heading in the right direction. The faster the feedback, the less time you spend heading towards dead ends. So the benefit increases as feedback gets faster. But what of the cost? It is useful to split the cost into two categories:
- Setup cost: how hard is it to create the system to produce the feedback? Is it more difficult to create faster feedback?
- Ongoing cost: what does it cost to receive the feedback? Do you pay this every time?
Setup cost is not usually a big issue except in extreme cases, as it can be paid off over time. More interesting is the ongoing cost of receiving and processing feedback. Naturally receiving, understanding and acting on feedback takes time. The important part is distinguishing when this time is productive versus wasted. Processing feedback can be wasteful in various ways:
- False alarms: if the feedback is incorrect, at best it is an irritation. At worst, significant time is wasted chasing a wild goose.
- Distraction: if feedback arrives when you are in the middle of something, it can interrupt your flow. Even if the feedback is useful, its benefit can be outweighed by the cost of losing your train of thought.
- Repetition: if feedback tells you nothing new, then it has no benefit. Any time spent acknowledging it is wasted.
How do these problems relate to the speed of feedback? The cost of false alarms is roughly inversely proportional to the length of the feedback cycle. As the relationship is linear, false alarms are not compounded by making feedback faster. In any case, accuracy of feedback is always paramount: false alarms must be eliminated.
More interesting is the relationship between feedback speed and the other two potential problems: distraction and repetition. In both these cases, shrinking the feedback cycle may produce a disproportionate increase in the occurrence of problems. Thus, when shrinking the feedback cycle, care must be taken to address both of these areas.
Distraction can be addressed in multiple ways. Firstly, receipt of feedback should be optional. Users must be able to pause the feedback mechanism (or possibly switch it off altogether). Then when a user needs to for deep thought, they can ensure that they are not interrupted. Secondly, the method of feedback needs to be in keeping with both the importance and frequency of the message to convey. For example, an emergency that requires immediate attention may use a bold and hard to ignore mechanism, whereas a small status update should be less intrusive. Continuous feedback must be modeless — as an example think of the real time spell-checking functionality in many word processors. Lastly, feedback should be categorised to allow smart filtering. Different users often require different levels of feedback — what is important for some is a trivial detail to others.
Repetition is usually simpler to address: feedback can be suppressed when there is no change. Note that in some cases a change for one user may not represent a change for another user. Thus, as above, filtering that is based on individual users may be necessary.
Conclusion
The value of feedback, and indeed fast feedback, is unquestioned. Taking this principle to the extreme, it follows that faster feedback is better. I feel this is true provided that the crucial issues of distraction and repetition are addressed.
Speeding Up Acceptance Tests: Optimise Your Application
As our new acceptance test suite for Pulse 2.0 has grown, naturally it has taken longer to execute. More problematicly, however, several tests towards the end of the suite were taking longer and longer to run. This clearly highlighted a scalability issue: as more data was piled into the testing installation, tests took longer to execute. The situation became so bad that a single save in the configuration UI could take tens of seconds to complete.
A simple way to solve this would be to blow away the data after each test (or at regular intervals) so that we don’t see the build up. This would improve the performance of the test suite, but would also hide a clear performance issue in the application. Better to keep on eating our own dog food and fix the underlying problem itself. This way not only would the tests execute more quickly, but (more importantly) the product would also be improved.
Basic profiling identified the major bootleneck was many repeated calls to Introspector.getBeanInfo. This call was known by us to be slow, but according the documentation the results should be cached internally. It turns out that the cache is ignored when a certain argument is used (quite common!), so we were triggering a large amount of reflection many times over. By adding a simple cache on top of this method, it practically disappeared from the profiles and the time taken to run our test suite was better than halved! More importantly, the latency in the UI was decreased by several times to an acceptable level.
Naturally optimisation is not always this easy. The more tuned your application, the harder it becomes to find such dramatic improvements. However, it pays to keep an eye on your test suite performance, and profile if things don’t feel right. You might just find a simple win-win situation like the one described.
C Unit Testing with CUnit
As a part of our continued efforts to support every unit testing library we can in Pulse, I have been looking into CUnit, the imaginitively-named framework for plain C.
Initial Impressions
The first thing I noticed about CUnit was the clean website and full documentation. This is a rarity amongst the C-based libraries I have seen, and very welcome to a first-time user. The structure supported by the library is a fairly standard suite-based grouping of individual test functions. Notably, CUnit comes with a few reporting interfaces out of the box – including the all important XML reports (typically the easiest way to integrate into a continuous build).
Setup on Windows
I built the library using Visual C++ 2008 Express. Although the provided VC files were from an earlier version, 2008 converted and built the solution without issues. The static library appeared at:
Note: I first tried my luck with an unsupported configuration by building the library under Cygwin using ftjam. This didn’t work, although I suspect porting the build would not be hard.
The First Test
Next I started with a contrived test program to see how it all fits together. The program has a single suite with setup and teardown functions, with one passing an one failing case.
#include "CUnit\Basic.h"
char *gFoo;
int setup(void)
{
gFoo = strdup(“i can has cheezburger”);
if(gFoo == NULL)
{
return 1;
}
return 0;
}
int teardown(void)
{
free(gFoo);
return 0;
}
void testPass(void)
{
CU_ASSERT_STRING_EQUAL(gFoo, “i can has cheezburger”);
}
void testFail(void)
{
CU_ASSERT_STRING_EQUAL(gFoo, “no soup for you!”);
}
int main()
{
CU_pSuite pSuite = NULL;
if(CU_initialize_registry() != CUE_SUCCESS)
{
return CU_get_error();
}
pSuite = CU_add_suite(“First Suite”, setup, teardown);
if(pSuite == NULL)
{
goto exit;
}
if(CU_add_test(pSuite, “Test Pass”, testPass) == NULL)
{
goto exit;
}
if(CU_add_test(pSuite, “Test Fail”, testFail) == NULL)
{
goto exit;
}
CU_basic_set_mode(CU_BRM_VERBOSE);
CU_basic_run_tests();
exit:
CU_cleanup_registry();
return CU_get_error();
}
The code illustrates how to write test functions with assertions, as well as how to assemble a test suite and run it with the built-in basic UI. The output from this program is shown below:
CUnit - A Unit testing framework for C - Version 2.1-0
http://cunit.sourceforge.net/
Suite: First Suite
Test: Test Pass ... passed
Test: Test Fail ... FAILED
1. c:\projects\cunitplay\firsttest\main.cpp:30 -
CU_ASSERT_STRING_EQUAL(gFoo,"no soup for you!")
--Run Summary: Type Total Ran Passed Failed
suites 1 1 n/a 0
tests 2 2 1 1
asserts 2 2 1 1
Generating XML Reports
Generating an XML report required a very small change. Rather than using the basic UI, I switched to the automated UI and set the output filename prefix:
#include "CUnit\Automated.h"
/* Test functions the same as previous example. */
…
int main()
{
/* All setup code remains the same */
…
if(CU_add_test(pSuite, “Test Fail”, testFail) == NULL)
{
goto exit;
}
CU_set_output_filename(“CUnit”);
CU_automated_run_tests();
CU_list_tests_to_file();
exit:
CU_cleanup_registry();
return CU_get_error();
}
Running this program produces two XML output files: CUnit-Listing.xml and CUnit-Results.xml. The former contains data about the test suites and cases themselves, the latter contains the results of running the tests. This latter report is useful for integration with other systems; e.g. it is the report that we now post-process to pull test results into Pulse.
Note that these XML files come with a DTD and stylesheet, which is useful, but also akward. As the DTD and XSL file are not present alongside the reports where they are generated, opening the files will fail.
Improvements
Starting from this vey basic test setup leaves plenty of room for improvement. For example, if you look at the sample code above, you will notice that a lot of time is spent assembling the suite and checking for errors. Thankfully the CUnit authors have noticed the same and offer simplifications:
- There are shortcuts for managing tests. For a non-trivial suite this method of describing the test suites in nested arrays and registering in one hit would be a lot simpler.
- You can optionally tell CUnit to abort on error. Since any error at the CUnit level is likely to be fatal, this should reduce the need for error handling.
Another improvement that could perhaps be added in the future is better reporting for assertion failures. Looking back at the assertion failure output:
Test: Test Fail ... FAILED
1. c:\projects\cunitplay\firsttest\main.cpp:30 -
CU_ASSERT_STRING_EQUAL(gFoo,"no soup for you!")
you can see that it pulls out the assertion expression as-is. However, given that I am using a specific string equals assertion, CUnit should also be able to show me the actual values that were compared at runtime, and even the difference. This can avoid the need to run the test again under a debugger to see the values.
Conclusion
Overall I would rate CUnit as a decent choice for CUnit testing. In terms of features all the basics are covered, and important improvements have been made. There is not much new, however, and there is still room for improvement. Where this library really shines is in the out of the box experience; thanks to excellent documentation and multiple included UIs.
If you’re looking for a plain C unit testing solution this is the best open source alternative I have seen. While you’re at it, why not go for a full continuous integeration setup with the CUnit support now in Pulse
.
Re: Continuous Integration Is A Hack
Ben Rady’s recent blog post Continuous Integration Is A Hack always had a title that would attract the attention of someone working on this said “hack”. One core part of Ben’s argument is that:
Agility is a set of values and principles…and no more. Practices depend on technology and technique, which can (and should) be constantly evolving. Values, on the other hand, address the fundamental issues raised when humans work together to create something, and that is truly what makes Agility worthwhile.
This I completely agree with, and it also makes me sceptical of many “certifications”. However, Ben’s attack on CI (while making for a good headline) is off the mark in many ways. Let’s take a look at what he actually says:
CI is one of many useful practices that Agile developers employ today, but fundamentally, it is a hack. That’s because although it’s very useful to know that I’ve introduced a bug 20 minutes after creating it, what I really want is to know the very second that I type the offending line in my editor. CI is just the best that we can do right now, given the technology that is at our disposal.
This shows Ben’s leaning towards Continuous Testing. However, the criticism is wrong in several ways:
- CI is not just (or even primarily) about automated testing, it is about integration. Running tests as you edit your code does not take integration with your team members into account!
- Running tests continuously on your local machine doesn’t help detect cross-platform or machine-specific issues – you need builds running in an independent environment.
- Some tests will never run fast enough to be run continuously, but they still need to be run as frequently as possible.
I could go on, but there is a more fundamental problem here: Ben is boxing the definition of CI into the technical boundaries that he sees today. CI is a practice, not an implementation. I find it ironic that Ben is railing against the definition of Agile as a static set of practices, but then defines CI by today’s implementations! As time passes tools will improve and will take CI to new levels. A key area of improvement has always been reducing feedback time: so rather than being at odds with CI, continuous testing is one part of the larger practice!
Speeding Up Acceptance Tests: Write More Unit Tests
Speed up your tests by writing more tests? No, I haven’t gone insane, there is a clear distinction between the two types of tests:
- Acceptance tests verify the functionality of your software at the users level. In our case these tests run against a real Pulse package which we automatically unpack, setup and exercise via the web and XML-RPC interfaces.
- Unit tests verify functionality of smaller “units” of code. The definition of a unit varies, but for our purposes a single unit test exercises a small handful of methods (at most).
From a practical standpoint, acceptance tests are much slower than unit tests. This is due to their high-level nature: each test will typically exercise a lot more code. Add to this the considerable overhead of driving the application via a real interface (e.g. using Selenium), and each acceptance test can take several seconds to execute. All this time stacks up and can considerably slow a full build and test.
The huge difference between the time taken to execute a typical unit and acceptance test leads to a simple goal for speeding up your build as a whole: prefer unit tests where possible. The important thing is to do so without compromising the quality of your test suite. Remember: at the end of the day what happens at the users level is all that matters, so you can’t compromise your acceptance testing. However, there are many examples where you can push tests down from the acceptance to the unit level.
A recent example in Pulse is testing of cloning functionality. In Pulse 2.0, the new configuration UI allows cloning of any named element. Under the hood the clone is quite powerful: it needs to deal with configuration inheritance, references both inside and outside the element being cloned, cloning multiple items at once and so on. However, none of this needs to be tested at the acceptance level. Instead, the acceptance tests focus on testing the UI (forms, validation and so on), and an example of each basic type of clone. Underneath, the unit tests are used to exercise the various combinations to ensure that the back end works as expected. As the line between the front and back ends is clear, it is relatively easy to decide which tests belong where.
So, next time you’re creating some acceptance tests for your software, remember to ask yourself if you can push some tests down to a lower level. Do all the combinations need to be tested at a high level, or are they all equivalent from the UI perspective? A little forethought can shave considerable time off your build.
Running Selenium Headless
Our dog food Pulse server, which spends all day building itself, is a headless box. This presented no challenge for the Pulse 1.x series of releases, as our builds are all scripted and don’t require any GUI tools. With Pulse 2.0, however, things changed. As mentioned in my previous post, this new release has a whole new acceptance test suite based on Selenium. The problem is: Selenium runs in a real browser, and that browser requires a display.
Fortunately, the dog food server is also a Linux box. Thus there is no need to add in a full X setup with requisite hardware just to have Selenium running. This is thanks to the magic of Xvfb – the X virtual framebuffer. Basically, this is a stripped back X server that maintains a virtual display in memory. Hence no actual video hardware or driver is needed, and we can keep things simple.
Setting things up is straightforward. First, install Xvfb:
on Debian/Ubuntu; or
on Fedora/RedHat. Then, choose a display number that is unlikely to ever clash (even if you add a real display later) – something high like 99 should do. Run Xvfb on this display, with access control off1:
Now you need to ensure that your display is set to 99 before running the Selenium server (which itself launches the browser). The easiest way to do this is to export DISPLAY=:99 into the environment for Selenium. First, make sure things are working from the command line like so:
$ firefox
Firefox should launch without error, and stay running (until you kill it with Control-C or similar). You won’t see anything of course! If things go well, then you need to modify the way you launch the Selenium server to ensure the DISPLAY is set. There are many ways to do this, in our Pulse setup we actually use a resource as it is a convenient way to modify the build environment.
An alternative which may suit some setups is to use the xvfb-run wrapper to launch your Selenium server. I opted against this as Pulse has resources to modify the environment and this way I do not need to change the actual build scripts at all. However if you are not using Pulse (shame!
) then you may want to look into it.
–
1 If you are worried about access on your network, then using -ac long-term is not a good idea. Once you have things working, I would suggest tightening things up by turning access control on.
Acceptance Testing: Getting to the Gravy Phase
Pulse 2.0 has a completely overhauled configuration UI. The current 1.2 UI uses standard web forms, whereas the new UI is ExtJS-based and makes heavy use of AJAX. This makes for a huge improvement in usability, unfortunately at the cost of making all of our existing acceptance tests for the UI redundant! So, we needed to go back to square one with our acceptance test suite. It was a painful process, but going through it again reminded me of something important:
The setup cost of your first few acceptance tests is high, but after that it is all gravy.
The process from zero to gravy is similar for most projects, so I’ve summarised the phases we experienced below.
Phase 1: Choosing the Technology
The first step was to switch out jWebUnit for Selenium. Although jWebUnit can execute much of the Javascript in our new UI, it falls down in two key ways:
- Executing much or even most of the Javascript is not good enough – it makes writing important tests impossible.
- It does nothing to test real-world differences in the various Javascript engines that are the source of many bugs.
As Selenium drives the actual browser, you can test anything that will run in the browser and can also test compatibility. The main (well known) drawback is that Selenium is slow. As important as speed can be, however, accuracy is far more important.
Phase 2: Scripting Setup and Teardown
This is where the slog started. The most accurate way to acceptance test is to start with your actual release artifact (installer, tarball, WAR file, whatever). In our case, this is the Pulse package, which comes in various forms. Before we can actually test the UI, we need to get the Pulse server unpacked, started and setup. When the tests are complete, we also need scripts to stop the server and clean up. This way we can integrate the tests into our build and, of course, run the acceptance tests using our own Pulse installation!
So, a day of scripting later, and we don’t have a single test. Sigh.
Phase 3: The First Test
Now things got serious: writing the first test was painfully hard. I constantly hit snags:
- How do I use this newfangled Selenium thingy?
- How do I test a UI that is asynchronous?
- How can I verify the state of ExtJS forms?
The important thing here was to keep pushing. The amount of effort to get to the first test case is not worth it for that test alone, but I knew it was an investment in knowlege that would help us develop further tests.
Phase 4: Abstraction
As I continued to write the first few test cases, repetition crept in. Verifying the state of a form is similar for all forms. Many tests navigate over the same pages, examining the state in similar ways. The code was ripe for refactoring! I started abstracting away the actual clicks, keystrokes and inspections and building up a model of the application. The classic way to do this is to represent pages and forms as individual classes that provie high-level interfaces. Over time, our tests changed from something like:
type(”password”, “admin”)
goTo(”/admin/projects/”)
waitForPageToLoad()
click(”add.project”)
waitForCondition(”formLoaded”)
type(”name”, “p1″)
type(”description”, “test”)
click(”ok”)
waitForElement(”projects”)
assertTrue(isElementPresent(”project.p1′))
to:
projectsPage.goTo()
wizard = projectsPage.clickAdd()
wizard.next(name, description)
projectsPage.waitFor()
assertTrue(projectsPage.isProjectPresent(name))
and then to:
addProject(name, description)
assertProjectPresent(name)
Phase 5: Gravy
With the right abstractions in place, adding acceptance tests is now much easier. The tests themselves are also easier to understand and maintain due to their declarative nature. Now we can reap the rewards of those painful early stages!
Conclusion
The process of setting up an acceptance test suite is quite daunting, and initially painful. But if you persevere and constantly look for useful abstractions, you’ll reap the rewards in the long run.
You are currently browsing the archives for the Testing category.
