Re: Continuous Integration Is A Hack

March 7th, 2008

Ben Rady’s recent blog post Continuous Integration Is A Hack always had a title that would attract the attention of someone working on this said “hack”. One core part of Ben’s argument is that:

Agility is a set of values and principles…and no more. Practices depend on technology and technique, which can (and should) be constantly evolving. Values, on the other hand, address the fundamental issues raised when humans work together to create something, and that is truly what makes Agility worthwhile.

This I completely agree with, and it also makes me sceptical of many “certifications”. However, Ben’s attack on CI (while making for a good headline) is off the mark in many ways. Let’s take a look at what he actually says:

CI is one of many useful practices that Agile developers employ today, but fundamentally, it is a hack. That’s because although it’s very useful to know that I’ve introduced a bug 20 minutes after creating it, what I really want is to know the very second that I type the offending line in my editor. CI is just the best that we can do right now, given the technology that is at our disposal.

This shows Ben’s leaning towards Continuous Testing. However, the criticism is wrong in several ways:

  • CI is not just (or even primarily) about automated testing, it is about integration. Running tests as you edit your code does not take integration with your team members into account!
  • Running tests continuously on your local machine doesn’t help detect cross-platform or machine-specific issues - you need builds running in an independent environment.
  • Some tests will never run fast enough to be run continuously, but they still need to be run as frequently as possible.

I could go on, but there is a more fundamental problem here: Ben is boxing the definition of CI into the technical boundaries that he sees today. CI is a practice, not an implementation. I find it ironic that Ben is railing against the definition of Agile as a static set of practices, but then defines CI by today’s implementations! As time passes tools will improve and will take CI to new levels. A key area of improvement has always been reducing feedback time: so rather than being at odds with CI, continuous testing is one part of the larger practice!

Speeding Up Acceptance Tests: Write More Unit Tests

March 6th, 2008

Speed up your tests by writing more tests? No, I haven’t gone insane, there is a clear distinction between the two types of tests:

  • Acceptance tests verify the functionality of your software at the users level. In our case these tests run against a real Pulse package which we automatically unpack, setup and exercise via the web and XML-RPC interfaces.
  • Unit tests verify functionality of smaller “units” of code. The definition of a unit varies, but for our purposes a single unit test exercises a small handful of methods (at most).

From a practical standpoint, acceptance tests are much slower than unit tests. This is due to their high-level nature: each test will typically exercise a lot more code. Add to this the considerable overhead of driving the application via a real interface (e.g. using Selenium), and each acceptance test can take several seconds to execute. All this time stacks up and can considerably slow a full build and test.

The huge difference between the time taken to execute a typical unit and acceptance test leads to a simple goal for speeding up your build as a whole: prefer unit tests where possible. The important thing is to do so without compromising the quality of your test suite. Remember: at the end of the day what happens at the users level is all that matters, so you can’t compromise your acceptance testing. However, there are many examples where you can push tests down from the acceptance to the unit level.

A recent example in Pulse is testing of cloning functionality. In Pulse 2.0, the new configuration UI allows cloning of any named element. Under the hood the clone is quite powerful: it needs to deal with configuration inheritance, references both inside and outside the element being cloned, cloning multiple items at once and so on. However, none of this needs to be tested at the acceptance level. Instead, the acceptance tests focus on testing the UI (forms, validation and so on), and an example of each basic type of clone. Underneath, the unit tests are used to exercise the various combinations to ensure that the back end works as expected. As the line between the front and back ends is clear, it is relatively easy to decide which tests belong where.

So, next time you’re creating some acceptance tests for your software, remember to ask yourself if you can push some tests down to a lower level. Do all the combinations need to be tested at a high level, or are they all equivalent from the UI perspective? A little forethought can shave considerable time off your build.

Running Selenium Headless

March 5th, 2008

Our dog food Pulse server, which spends all day building itself, is a headless box. This presented no challenge for the Pulse 1.x series of releases, as our builds are all scripted and don’t require any GUI tools. With Pulse 2.0, however, things changed. As mentioned in my previous post, this new release has a whole new acceptance test suite based on Selenium. The problem is: Selenium runs in a real browser, and that browser requires a display.

Fortunately, the dog food server is also a Linux box. Thus there is no need to add in a full X setup with requisite hardware just to have Selenium running. This is thanks to the magic of Xvfb - the X virtual framebuffer. Basically, this is a stripped back X server that maintains a virtual display in memory. Hence no actual video hardware or driver is needed, and we can keep things simple.

Setting things up is straightforward. First, install Xvfb:

# apt-get install xvfb

on Debian/Ubuntu; or

# yum install xorg-x11-Xvfb

on Fedora/RedHat. Then, choose a display number that is unlikely to ever clash (even if you add a real display later) - something high like 99 should do. Run Xvfb on this display, with access control off1:

# Xvfb :99 -ac

Now you need to ensure that your display is set to 99 before running the Selenium server (which itself launches the browser). The easiest way to do this is to export DISPLAY=:99 into the environment for Selenium. First, make sure things are working from the command line like so:

$ export DISPLAY=:99
$ firefox

Firefox should launch without error, and stay running (until you kill it with Control-C or similar). You won’t see anything of course! If things go well, then you need to modify the way you launch the Selenium server to ensure the DISPLAY is set. There are many ways to do this, in our Pulse setup we actually use a resource as it is a convenient way to modify the build environment.

An alternative which may suit some setups is to use the xvfb-run wrapper to launch your Selenium server. I opted against this as Pulse has resources to modify the environment and this way I do not need to change the actual build scripts at all. However if you are not using Pulse (shame! :) ) then you may want to look into it.


1 If you are worried about access on your network, then using -ac long-term is not a good idea. Once you have things working, I would suggest tightening things up by turning access control on.

Acceptance Testing: Getting to the Gravy Phase

February 29th, 2008

Pulse 2.0 has a completely overhauled configuration UI. The current 1.2 UI uses standard web forms, whereas the new UI is ExtJS-based and makes heavy use of AJAX. This makes for a huge improvement in usability, unfortunately at the cost of making all of our existing acceptance tests for the UI redundant! So, we needed to go back to square one with our acceptance test suite. It was a painful process, but going through it again reminded me of something important:

The setup cost of your first few acceptance tests is high, but after that it is all gravy.

The process from zero to gravy is similar for most projects, so I’ve summarised the phases we experienced below.

Phase 1: Choosing the Technology

The first step was to switch out jWebUnit for Selenium. Although jWebUnit can execute much of the Javascript in our new UI, it falls down in two key ways:

  1. Executing much or even most of the Javascript is not good enough - it makes writing important tests impossible.
  2. It does nothing to test real-world differences in the various Javascript engines that are the source of many bugs.

As Selenium drives the actual browser, you can test anything that will run in the browser and can also test compatibility. The main (well known) drawback is that Selenium is slow. As important as speed can be, however, accuracy is far more important.

Phase 2: Scripting Setup and Teardown

This is where the slog started. The most accurate way to acceptance test is to start with your actual release artifact (installer, tarball, WAR file, whatever). In our case, this is the Pulse package, which comes in various forms. Before we can actually test the UI, we need to get the Pulse server unpacked, started and setup. When the tests are complete, we also need scripts to stop the server and clean up. This way we can integrate the tests into our build and, of course, run the acceptance tests using our own Pulse installation!

So, a day of scripting later, and we don’t have a single test. Sigh.

Phase 3: The First Test

Now things got serious: writing the first test was painfully hard. I constantly hit snags:

  • How do I use this newfangled Selenium thingy?
  • How do I test a UI that is asynchronous?
  • How can I verify the state of ExtJS forms?

The important thing here was to keep pushing. The amount of effort to get to the first test case is not worth it for that test alone, but I knew it was an investment in knowlege that would help us develop further tests.

Phase 4: Abstraction

As I continued to write the first few test cases, repetition crept in. Verifying the state of a form is similar for all forms. Many tests navigate over the same pages, examining the state in similar ways. The code was ripe for refactoring! I started abstracting away the actual clicks, keystrokes and inspections and building up a model of the application. The classic way to do this is to represent pages and forms as individual classes that provie high-level interfaces. Over time, our tests changed from something like:

type(”username”, “admin”)
type(”password”, “admin”)
goTo(”/admin/projects/”)
waitForPageToLoad()
click(”add.project”)
waitForCondition(”formLoaded”)
type(”name”, “p1″)
type(”description”, “test”)
click(”ok”)
waitForElement(”projects”)
assertTrue(isElementPresent(”project.p1′))

to:

loginasAdmin()
projectsPage.goTo()
wizard = projectsPage.clickAdd()
wizard.next(name, description)
projectsPage.waitFor()
assertTrue(projectsPage.isProjectPresent(name))

and then to:

loginAsAdmin()
addProject(name, description)
assertProjectPresent(name)

Phase 5: Gravy

With the right abstractions in place, adding acceptance tests is now much easier. The tests themselves are also easier to understand and maintain due to their declarative nature. Now we can reap the rewards of those painful early stages!

Conclusion

The process of setting up an acceptance test suite is quite daunting, and initially painful. But if you persevere and constantly look for useful abstractions, you’ll reap the rewards in the long run.

Q: What Sucks More Than Java Generics?

February 27th, 2008

A: Java without generics. Yes, there are many problems with the Java generics implementation. And I seriously hope that the implementation is improved in Java 7. However, it occurs to me that the problems can generally be worked around by either:

  1. Not using generics where the limitations make it too difficult; or
  2. Using unsafe casts.

In both cases, you are no worse off than you were pre-generics, where you had no choice! Despite their limitations, generics have done a lot to reduce painfully repetitive casting and mysterious interfaces (List of what now?). They also enable abstraction of many extremely common functions without losing type information. In these ways I benefit from generics every day, enough to outweigh the frustrations of erasure.

Refactoring vs Merging

February 22nd, 2008

Do The Simplest Thing That Could Possibly Work, then Refactor Mercilessly: two well known Extreme Rules. It’s hard to argue with the virtues of refactoring, but if you’ve ever had to manage parallel codelines then you might have a different perspective on “mercilessly”. Many types of refactoring are at odds with parallel codelines because they make merging - an already difficult task - more difficult. In particular, changes that physically move content around amongst files tend to confuse SCMs which take a file-based view of the world (i.e. the majority of them). How can we alleviate this tension? A few possibilities come to mind:

  1. Don’t do parallel development: if you don’t have to merge, you don’t have a problem. Since the act of merging itself is pure overhead to the development process, this is a tempting idea. However, reality (typically of the business kind), dictates that parallel development is necessary to some degree. How do you support customers on older versions? How do you take on large and risky subprojects without derailing the development trunk? These are scenarios where branching is valuable enough to justify the overhead of merging. It is reasonable therefore to try and minimise parallel development, but rarely possible to avoid it completely.
  2. Only refactor on the trunk (or whereever the majority of development is done). This helps to alleviate the problem to some degree, by containing the main divergence to a single codeline. It is also usually the codeline where you would naturally do most of your refactoring, as it is the bleeding edge. However, even merging a small bug fix from a maintenance branch may prove difficult due to code movements on the trunk. And this solution is not much help for active development branches that run parallel to the trunk for some time.
  3. Another Extreme Rule: Integrate Often. It is no coincidence that Perforce-speak for “merge” is “integrate”. Merging is a type of integration, and like any it is less painful if done regularly, before codelines have diverged too far. This is, however, a way to mitigate problems rather than solve them.
  4. Avoid problematic refactoring, such as moving code between files. In some cases it is clear that a certain refactoring is best avoided due to anticipated merging. However, avoiding all problematic refactoring is not a workable long-term strategy. At some point, the maintenance benefits of cleaning up the code will outweigh the merging cost.

In reality we use a combination of these ideas and a dose of common sense to reduce merging problems. However, the tension still exists and can be painful. A technology-base solution to this would of course be ideal, i.e. an SCM that understood refactoring and could take account of it when merging. Unfortunately I know of no such existing SCM, and there are significant barriers to creating one:

  1. Many popular SCMs of today struggle enough with the file-based model that it is hard to see them moving on to such a high level.
  2. File-based, text-diff change management applies genericly to a wide range of tasks. Any tool with deeper knowledge of actual file contents would likely trade off some of this genericity.
  3. Any “smart” merging algorithm will still have many corner cases where the right choice is ambiguous, and making these comprehensible to the poor human that needs to resolve them is more difficult when the algorithm is more complicated.

On a more positive note, there has been some welcome innovation in the SCM space recently. Perhaps I have overlooked solutions that exist now, or perhaps the growing competition will spur innovation in this area. Either way, if you have any ideas, let me know!

Zutubi London Office

February 13th, 2008

Wondering why everything has gone quiet over here? Well, all should become clear now: I have just completed a move from rainy Sydney to sunny London1. Combine the Christmas break with an overseas move and you have a recipe for zero blog posts!

The good news is that I am back in action in a new home in central London (Baker Street area). So, for now at least, Zutubi is operating in both Sydney (Daniel - GMT+11) and London (me - GMT) - the company never sleeps!

I intend on travelling to quite a bit of the UK and continental Europe while we are living here. This will hopefully give me the opportunity to meet some of our European customers. Let me know if you are in the area, and perhaps we can arrange to hook up over the coming months at some point.


1 Yes, it really has been quite sunny and mild since we got here, quite to our surprise! Reports from back home tell of plenty more rain down that way.

Phrases You Should Never See in an FAQ

December 5th, 2007

Today’s phrase of choice is “you don’t need”, with the word of the day being “never”. Consider this entry in the Hibernate FAQ:

The hibernate.hbm2ddl.auto=update setting doesn’t create indexes

SchemaUpdate is activated by this configuration setting. SchemaUpdate is not really very powerful and comes without any warranties. For example, it does not create any indexes automatically. Furthermore, SchemaUpdate is only useful in development, per definition (a production schema is never updated automatically). You don’t need indexes in development.

What’s the problem with this? The FAQ answer is saying (in a round about way) that you should not ask this question, as you don’t need an answer. Telling your users what they need is a good way to alienate them. If this question is common enough to warrant an entry in the FAQ in the first place, aren’t your users telling you that they do need this functionality? This doesn’t mean you have to jump to implement it - just don’t patronise your users by telling them you understand their needs better than they do.

Personally, I was looking at this entry because we use Hibernate for persistence in Pulse. In Pulse, there is a built-in upgrade framework that updates the schema automatically for you when you install a new version. So much for the assertion above that “a production schema is never updated automatically”. While recently adding an upgrade that required new indices, I also certainly did “need indexes in development” because I want my development environment to match production as closely as possible (not to mention the fact that they saved hours in testing time against large data sets).

The most interesting underlying thing is that the existence of this (and similar) FAQs seems to be indicating to the Hibernate team that the simple SchemaUpdate code could actually be the beginnings of an extremely useful tool. Too few applications have decent upgrade capabilities: our users are often pleasantly surprised by what we have been able to build with considerable help from SchemaUpdate to simplify their upgrades. Maybe the Hibernate team are underestimating the potential of their own tool?

Re: Groovy Or JRuby

November 30th, 2007

Martin Fowler makes an interesting claim in GroovyOrJRuby:

A strong reason to prefer Ruby is the fact that it lives in multiple implementations.

Fowler is alluding to the fact that Groovy runs on the JVM only, whereas “Ruby can run directly on mainstream operating systems with a C runtime, and is starting to run on .NET’s CLR”. I don’t see this as a strong advantage for Ruby at all, for two main reasons:

  1. If you need portability across mainstream operating systems, you already have it with the JVM (to a similar enough extent as with Ruby).
  2. If you need tight integration with a platform, I don’t think either the JVM or CRuby implementation has a clear advantage.

When you start a project, you need to decide what tradeoff to make between portability and depth of platform integration. If you favour portability, you can just take the JVM then choose whichever language you please. If you favour depth of integration, your platform will often dictate the language. If you want both, implement the portable part on the JVM and go native for the rest.

Continuous Integration Done Quick

November 30th, 2007

OK, so it’s not quite the same as Quake Done Quick, but Chris has put together a few demo videos that show just how easy it is to set up Pulse and get building. In less than 5 minutes a server is setup with a first project and build. Another 5 minutes and you have integrated tests, SCM-triggered builds and RSS notifications. Sweet.