a little madness

A man needs a little madness, or else he never dares cut the rope and be free -Nikos Kazantzakis

Zutubi

Archive for October, 2009

First impressions of StAX

I have recently been working on this issue, converting some of Pulses‘ XML processing from using DOM to using StAX. Whilst the DOM API is a simpler to work with, it is not so memory friendly and was becoming a problem for some of our customers.

I chose to try StAX rather than SAX because it was developed to be a middle ground between DOM and SAX, having the memory efficiency of a streaming parser like SAX whilst retaining a simpler API like DOM.

A quick note: This is not an exhaustive analysis of the pros and cons of the various XML APIs as this has been done before. Rather, this is a comparison of some of the things I made a mental note of whilst doing the conversion.

DOM

  • Works with an in memory tree representation of the XML document, and therefore has high memory requirements for large documents
  • Easy to use API that allows you to navigate the XML document in whatever way is most appropriate to. You can process a element as often as you like, go forwards and backwards as well as search through the document.

This is what our code looked like before the conversion.

protected void processSuites (Element root)
{

    String suiteElement = getConfig().getSuiteElement();

    Elements suiteElements = root.getChildElements(suiteElement);
    for(int i = 0; i < suiteElements.size(); i++)
    {
        processSuite(suiteElements.get(i), tests);
    }
}

Note that using dom is simply manipulating an in memory tree.

SAX

  • Implementations manage the state in the form of instance variables. This works well for simple documents, but becomes more difficult to manage as the document gets more complex.
  • Processing of an element typically occurs when you encounter the elements end tag, as only then is all of the elements content available. Until you reach an end tag, you have no real idea of how far through an element you are.
  • You only need to respond to elements that are of interest. Ie, when you receive a callback for one of these elements, just do nothing and return.

StAX

  • Implementations typically manage the state on the execution stack, with a new method call for each element that is encountered. This makes the code pretty easy to read as it is self documenting.
  • You need to process each and every (unfiltered) tag and event in the xml document. This is rather low level, and without care can lead to confusion and complications.

And this is what the code looked like after the conversion:

protected void processSuites (XMLStreamReader reader) throws XMLStreamException
{

    expectStartElement(ELEMENT_SUITES, reader);
    reader.nextTag();

    while (reader.isStartElement())
    {
        if (isElement(getConfig().getSuiteElement(), reader))
        {
            processSuite(reader);
        }
        else
        {
            // ignore this element.
            nextElement(reader);
        }
    }
    expectEndElement(ELEMENT_SUITES, reader);
}

My use of StAX is a little more regimented. I begin and end each method with an assertion that I am at the element I expect to be (this has the advantage of documenting the implementation). The rest of the implementation is similar to its DOM counterpart, expect that rather than simply asking for the elements I want to process, I need to loop over all the elements, skipping over those that are not of interest.

Summary

Overall, I am happy with the way the conversion has turned out. A couple of things were unexpected. The StAX API did not include any higher level utility functions that allowed you to move around at the element level, only between the end tag of one element and the start tag of the next. The other was that it required a fair amount of effort to write the code such that it was resilient to unexpected data in the reports. Every tag has to be processed after all.

CITCON Paris 2009: Mocks, CI Servers and Acceptance Testing

Following up on my previous post about CITCON Paris, I thought I’d post a few points about each of the other sessions I attended.

Mock Objects

I went along to this session as a chance to hear about mock objects from the perspective of someone involved in their development, Steve Freeman. If you’ve read my Four Simple Rules for Mocking, you’ll know I’m not too keen on setting expectations, or even on verification. I mainly use mocking libraries for stubbing. Martin Fowler’s article Mocks Aren’t Stubs had make me think that Steve would hold the opposite view:

The classical TDD style is to use real objects if possible and a double if it’s awkward to use the real thing. So a classical TDDer would use a real warehouse and a double for the mail service. The kind of double doesn’t really matter that much.

A mockist TDD practitioner, however, will always use a mock for any object with interesting behavior. In this case for both the warehouse and the mail service.

So my biggest takeaway from this topic was that Steve’s view was more balanced and pragmatic than Fowler’s quote suggests. At a high level he explained well how his approach to design and implementation leads to the use of expectations in his tests. I still have my reservations, but was convinced that I should at least take a look at Steve’s new book (which is free online, so I can try a chapter or two before opting for a dead tree version).

A few more concrete pointers can be found in the session notes. A key one for me is to not mock what you don’t own, but to define your own interfaces for interacting with external systems (and then mock those interfaces).

The Future of CI Servers

I wasn’t too keen on this topic, but since it is my business, I felt compelled. I actually proposed a similar topic at my first CITCON back in Sydney and found it a disappointing session then, so my expectations were low. Apart from the less interesting probing of features on the market already, conversation did wander onto the more interesting challenge of scaling development teams.

The agile movement recognises the two main challenges (and opportunities) in software development are people and change. So it was interesting to hear this recast as wanting to return to our “hacker roots” — where we could code away in a room without the challenges of communication, integration and so on. Ideas such as using information radiators to bring a “small team” feel to large and/or distributed teams were mentioned. A less tangible thought was some kind of frequent but subtle feedback of potential integration issues. Most of the time you could code away happily, but in the background your tools would be constantly keeping an eye out for potential problems. What I like about this is the subtlety angle: given the benefits it’s easy to think that more feedback is always better, without thinking of the cost (e.g. interruption of flow).

Acceptance Testing

This year it seemed like every other session involved acceptance testing somehow. Not terribly surprising I guess since it is a very challenging area both technically and culturally. As I missed most of these sessions, they are probably better captured by other posts:

One idea I would call attention to is growing a custom, targeted solution for your project. I believe it was Steve Freeman that drew attention to an example in the Eclipse MyFoundation Portal project. If you drill down you can see use cases represented in a custom swim lane layout.

Water Cooler Discussions

Of course a great aspect of the conference is the random discussions you fall into with other attendees. One particular discussion (with JtF) has given me a much-needed kick up the backside. We were talking about the problems with trying to use acceptance tests to make up for a lack of unit testing. This is a tempting approach on projects that don’t have a testable design and infrastructure in place — it’s just easier to start throwing tests on top of your external UI.

Even though I knew all the drawbacks of this approach, I had to confess that this is essentially what has happened with the JavaScript code in Pulse. We started adding AJAX to the Pulse UI in bits and pieces without putting the infrastructure in place to test this code in isolation. Fast forward to today and we have a considerable amount of JavaScript code which is primarily tested via Selenium. So we’re now going to get serious about unit testing this code, which will simultaneously improve our coverage and reduce our build times.

Conclusion

To wrap up, after returning from Paris I plan to:

  1. Give expectations a fair hearing, by reading Steve’s book.
  2. Look for ways to improve our own information radiators to help connect Zutubi Sydney and London.
  3. Get serious about unit testing our JavaScript code.
  4. Get PJ and JtF to swap the dates for CITCON Asia/Pacific and Europe next year so I can get to both instead of neither! 😉

If I succeed at 4 (sadly not likely!) then I’ll certainly be back next year!

CITCON Paris 2009

As mentioned Daniel and I both attended CITCON Paris the weekend before last. I’ve not had a chance to post a follow up yet as we also took the opportunity to eat the legs off every duck in France (well, we tried).

Firstly a huge thanks to PJ, Jeff, Eric and all the other volunteers for another great conference. Thanks again to Eric and Guillaume for acting as local guides on Saturday night. As always, the open spaces format and mix of attendees delivered a great day. It was also great to see a few familiar faces from the year before in Amsterdam (and a familiar shirt thanks to Ivan 🙂 ).

This year I proposed and facilitated a single topic: Distributed SCM in the Corporate World. I finally added a full write-up on the conference wiki earlier in the week for those who are interested. For the impatient, here are my take-aways from the session:

  1. Of the distributed SCMs, there is not much traction in the corporate world just yet, although git appears to have gained a foothold. (Obviously our sample size is small, but I also expect CITCON attendees to be closer to the edge than the average team.)
  2. Where distributed SCMs are used, the topology is still like the centralised model. However, the ability to easily clone and move changes between repositories presents opportunities to work around issues like painful networks (contrast this to special proxy servers which are needed in similar scenarios with centralised SCMs).
  3. The people using git liked it primarily for its more flexible workflow and better merging. It’s conceivable to have this in the centralised model too, but no single centralised contender was mentioned.
  4. So far the use of distributed SCMs didn’t seem to have practical implications for CI – probably due to the use of a centralised topology.

Looks like we’re still waiting to see more creative use of distributed SCMs in corporate projects – perhaps it is something worth revisiting in future conferences. I hope to post on some of the other sessions I attended at a later date.