Archive for the ‘Technology’ Category

Refactoring vs Merging

Friday, February 22nd, 2008

Do The Simplest Thing That Could Possibly Work, then Refactor Mercilessly: two well known Extreme Rules. It’s hard to argue with the virtues of refactoring, but if you’ve ever had to manage parallel codelines then you might have a different perspective on “mercilessly”. Many types of refactoring are at odds with parallel codelines because they make merging – an already difficult task – more difficult. In particular, changes that physically move content around amongst files tend to confuse SCMs which take a file-based view of the world (i.e. the majority of them). How can we alleviate this tension? A few possibilities come to mind:

  1. Don’t do parallel development: if you don’t have to merge, you don’t have a problem. Since the act of merging itself is pure overhead to the development process, this is a tempting idea. However, reality (typically of the business kind), dictates that parallel development is necessary to some degree. How do you support customers on older versions? How do you take on large and risky subprojects without derailing the development trunk? These are scenarios where branching is valuable enough to justify the overhead of merging. It is reasonable therefore to try and minimise parallel development, but rarely possible to avoid it completely.
  2. Only refactor on the trunk (or whereever the majority of development is done). This helps to alleviate the problem to some degree, by containing the main divergence to a single codeline. It is also usually the codeline where you would naturally do most of your refactoring, as it is the bleeding edge. However, even merging a small bug fix from a maintenance branch may prove difficult due to code movements on the trunk. And this solution is not much help for active development branches that run parallel to the trunk for some time.
  3. Another Extreme Rule: Integrate Often. It is no coincidence that Perforce-speak for “merge” is “integrate”. Merging is a type of integration, and like any it is less painful if done regularly, before codelines have diverged too far. This is, however, a way to mitigate problems rather than solve them.
  4. Avoid problematic refactoring, such as moving code between files. In some cases it is clear that a certain refactoring is best avoided due to anticipated merging. However, avoiding all problematic refactoring is not a workable long-term strategy. At some point, the maintenance benefits of cleaning up the code will outweigh the merging cost.

In reality we use a combination of these ideas and a dose of common sense to reduce merging problems. However, the tension still exists and can be painful. A technology-base solution to this would of course be ideal, i.e. an SCM that understood refactoring and could take account of it when merging. Unfortunately I know of no such existing SCM, and there are significant barriers to creating one:

  1. Many popular SCMs of today struggle enough with the file-based model that it is hard to see them moving on to such a high level.
  2. File-based, text-diff change management applies genericly to a wide range of tasks. Any tool with deeper knowledge of actual file contents would likely trade off some of this genericity.
  3. Any “smart” merging algorithm will still have many corner cases where the right choice is ambiguous, and making these comprehensible to the poor human that needs to resolve them is more difficult when the algorithm is more complicated.

On a more positive note, there has been some welcome innovation in the SCM space recently. Perhaps I have overlooked solutions that exist now, or perhaps the growing competition will spur innovation in this area. Either way, if you have any ideas, let me know!

Zutubi London Office

Wednesday, February 13th, 2008

Wondering why everything has gone quiet over here? Well, all should become clear now: I have just completed a move from rainy Sydney to sunny London1. Combine the Christmas break with an overseas move and you have a recipe for zero blog posts!

The good news is that I am back in action in a new home in central London (Baker Street area). So, for now at least, Zutubi is operating in both Sydney (Daniel – GMT+11) and London (me – GMT) – the company never sleeps!

I intend on travelling to quite a bit of the UK and continental Europe while we are living here. This will hopefully give me the opportunity to meet some of our European customers. Let me know if you are in the area, and perhaps we can arrange to hook up over the coming months at some point.


1 Yes, it really has been quite sunny and mild since we got here, quite to our surprise! Reports from back home tell of plenty more rain down that way.

Phrases You Should Never See in an FAQ

Wednesday, December 5th, 2007

Today’s phrase of choice is “you don’t need”, with the word of the day being “never”. Consider this entry in the Hibernate FAQ:

The hibernate.hbm2ddl.auto=update setting doesn’t create indexes

SchemaUpdate is activated by this configuration setting. SchemaUpdate is not really very powerful and comes without any warranties. For example, it does not create any indexes automatically. Furthermore, SchemaUpdate is only useful in development, per definition (a production schema is never updated automatically). You don’t need indexes in development.

What’s the problem with this? The FAQ answer is saying (in a round about way) that you should not ask this question, as you don’t need an answer. Telling your users what they need is a good way to alienate them. If this question is common enough to warrant an entry in the FAQ in the first place, aren’t your users telling you that they do need this functionality? This doesn’t mean you have to jump to implement it – just don’t patronise your users by telling them you understand their needs better than they do.

Personally, I was looking at this entry because we use Hibernate for persistence in Pulse. In Pulse, there is a built-in upgrade framework that updates the schema automatically for you when you install a new version. So much for the assertion above that “a production schema is never updated automatically”. While recently adding an upgrade that required new indices, I also certainly did “need indexes in development” because I want my development environment to match production as closely as possible (not to mention the fact that they saved hours in testing time against large data sets).

The most interesting underlying thing is that the existence of this (and similar) FAQs seems to be indicating to the Hibernate team that the simple SchemaUpdate code could actually be the beginnings of an extremely useful tool. Too few applications have decent upgrade capabilities: our users are often pleasantly surprised by what we have been able to build with considerable help from SchemaUpdate to simplify their upgrades. Maybe the Hibernate team are underestimating the potential of their own tool?

The Wrong Reason To Choose Open Source

Friday, September 21st, 2007

I see it time and time again in comparisons of software products where one or more of the options is open source. Someone chimes in with a comment along the lines of:

Why pay for product X when I can get open source Y for free?

My response to this is simple: X != Y. Price is a very primitive way to choose software. Differences in price, even when one price is zero, are often insignificant when compared to other factors. Unfortunately, these other factors aren’t so easily boiled down to a single number, so the lazy consumer does not pay them full heed.

If we could boil the other factors down to a single number, what would it look like? Well, allow me to present a gross simplification. Suppose you are a company comparing two tools: one open source, and a commercial alternative that costs $500/seat/year. The end goal of the both tools is to make the user more productive, i.e. to save them time. For argument’s sake, say the average user’s time costs the company $100,000/year. For the commercial software to be worth the cost, it needs to improve the user’s productivity (relative to the open source tool) by about 0.5% over all. Put another way, it needs to save the user around 12 minutes each week. This is a small ask, particularly if the tool is something the user relies on heavily to get their job done.

Sure, these numbers are pulled out of the air, but they are not that far removed from reality. Skilled workers can easily cost more than $100,000/year, and a heck of a lot of software is priced under $500/seat/year. The real point is that an increase in productivity is far more important than saving on licensing costs (except perhaps for software priced “by the enterprise, for the enterprise”).

There is also a sad flip side to this: open source has a whole lot more going for it than the price. Factors such as transparency, community and extensibility are usually far more important than saving a few dollars up front. Not to mention the fact that the most productive tool for you may be an open source alternative. By no means am I saying to avoid open source. But do yourself a favour: put in a bit more effort and make your choices for the right reasons.

Would You Like Some Tech News With Those Job Ads?

Wednesday, September 19th, 2007

Yes, I’m whining, but I can’t be the only person to notice that job ads are taking over the internet. More specifically, every tech news site seems to have its own job ads, including:

I guess the reason is that good people are hard to find, and our industry is no exception. I do wonder at the viability of these boards; it will be interesting to see if this spreads even further. Are the advantages of a centralised board offset by the differentiating factors offered by each site (in particular the ability to target a niche audience)? Perhaps a combination of the two ala the Google Adwords Content Network is our final destination?

Speeding Up Acceptance Tests: Use a Remote API

Wednesday, September 12th, 2007

Server applications can often benefit from exposing a remote API. Such an API enables users to both automate tasks and integrate with other systems. In the case of Pulse, we expose an XML-RPC remote API that allows remote control and monitoring of the Pulse server.

Apart from being a great feature, a remote API has another significant benefit: it enables us to control Pulse through a (relatively) fast interface during acceptance testing. The primary Pulse UI is accessed via a web browser. Although it is possible to automate web UI testing using tools such as JWebUnit and Selenium RC, the resulting tests are slow. Where we are not testing the web UI, or are testing only an isolated part of it, the overhead of driving the UI causes a huge blowout in testing time. This leads to slow builds: the enemy of continuous integration.

Thus, in many of our acceptance tests, a lot of the peripheral work such as setting up suitable data is done using the remote API. We also use the reporting functionality of the remote API to assert the current state of Pulse where possible. Tests only drive the web UI when they are testing the operation of the UI itself. The resulting tests are a lot quicker, enabling us to run our acceptance test suite more frequently. In Pulse 2.0, where the remote API has been extended to support full configuration, we are beginning to enjoy even faster acceptance tests.

So, if you are struggling with slow acceptance tests, consider adding a remote API to your application. As a bonus, your users get a great new feature!

The Key Thing In Python’s Favour

Wednesday, September 5th, 2007

I recently ran across this post advocating Python. This made me think about why I prefer Python over the rest of the scripting crowd. And I realised that there is just one key reason:

Python is easier to read

That’s it. For all the joys of programming in dynamically-typed languages, I think there is one major problem: in many ways these languages favour writers over readers. I covered one example of this problem back in my post on Duck Typing vs Readability. This is why it is so important that a dynamic language is designed with readability in mind.

In my opinion, Python really shines readability-wise. And this is no accident — if you follow design discussions about Python you will see that Guido is always concerned with code clarity. A feature is not worth adding just because it makes code more compact; it must make the code more concise. Features have even been removed from Python because they were seen to encourage compact but difficult to comprehend code (reduce being a classic example). Even controversial design choices like significant whitespace and the explicit use of “self” are actually good for readability.

The philosophy behind Python also extends beyond the language design. Clever tricks and one-liners are not encouraged by the community. Rather, the community aims for Pythonic code:

To be Pythonic is to use the Python constructs and datastructures with clean, readable idioms.

This is surely refreshing compared to the misguided boasting about how much can be crammed in to a few lines of [insert other scripting language]. Certainly, if I ever have to pick up and maintain another person’s dynamically-typed code, I hope it is Python. And since I know I will have to maintain my own code, when I go dynamic I always pick Python.

Unit Tests and Gambling

Tuesday, August 7th, 2007

Today I ran across this post which quotes a testing-related analogy:

Checking code into source-control without unit-tests is like betting on a poker hand without looking at the cards. — Anon.

A nice theme, which brought to mind another gambling analogy:

Developing software without unit tests is like playing the pokies1.

How so? In a few ways:

  • Poker machines are a mug’s game: in the long term, you always lose. Just so with a lack of unit testing: it will come back to haunt you as you struggle to fix bugs discovered late and lose the confidence to refactor.
  • The really insidious thing with poker machines is that they are designed to keep you playing by paying out small wins reasonably frequently. Likewise, developers get addicted to the short-term feeling of productivity when they cut reams of code without tests.
  • Gambling on pokies can become a downward spiral: “just one more spin and I’m done … ok, another one, I’m due for a win …”. Sounds scarily like the development team that will put better testing in place “just after we hit this important deadline”. There’s always another spin, and there’s always another deadline.

The key thing is to think long term. Don’t play a game you are set up to lose.


1 That’s poker machines/slot machines/fruit machines depending on where in the world you are throwing away your money.

The Problem With Maven

Wednesday, August 1st, 2007

Various times during the recent CITCON, the discussion turned to build tools. With lots of Java developers present, Maven turned up in most of these discussions. Like every build tool, there are things that people love and hate about Maven. The interesting trend in every Maven discussion is:

  • The people who love Maven love the theory. Maven defines and implements a highly-standardised build process so that if you are willing to follow that process you get a lot of stuff for free (plus the inherent consistency). This is a Good Thing, somewhat reminiscent of the in vogue “convention over configuration” mindset. Smart teams standardise their build process anyway, so why not take advantage by sharing as much of the implementation as possible?
  • The people who hate Maven hate the reality. The implementation just leaves a lot to be desired. Granted, taking on a larger scope than most build tools makes an ideal implementation more difficult. However, there are a couple of key implementation issues that cause headache after headache. First, there is a lack of flexibility. Small tweaks to the standardised model can be infuriatingly difficult to apply — frequently the only answer is to write a plugin. Second, there are major stability issues with plugins that are available. Some core plugins have historically been left in an unstable state for months.

Of course this split also creates many users that have a love/hate relationship with Maven: they love it when it works, and loathe it when it doesn’t. Users also spend a great deal of effort controlling their own Maven repositories to avoid some of the implementation headaches.

I don’t want this post to be taken as purely Maven bashing. The theory behind Maven is a valid one, and with some steps to resolve the implementation issues the project would attract and keep a lot more happy users. In my opinion, the project needs to immediately address the implementation concerns. Some ideas that come to mind:

  • A process needs to be put in place to control the stability of plugins. This must be a lightweight process that allows problems to be fixed and new versions delivered more quickly than in the past. Users also need more protection from unstable versions. If a new version is found to have a bug, wind back to a stable version so that the default plugin downloaded actually works.
  • The project could consider providing stable Maven “distributions” as pre-packaged downloads. Wrap these up out of battle-tested plugin versions.
  • Introduce at least one level of flexibility between “this is a simple config option” and “you’ll have to write a plugin”. I don’t pretend to have a magical answer for this problem: it is no easy task. But when a new user is trying Maven and finds some tweak will require them to write a plugin, it is a major turn off.

Make the reality smoother, and more people will buy into the theory.

CITCON Retrospective

Tuesday, July 31st, 2007

Well, CITCON Asia/Pacific 2007 is done. On the whole, the event was a great success. There was a great turnout, lots of interesting discussion, and all the organisation went smoothly. Thanks goes to both organisers, Jeffrey Fredrick and Paul Julius, for pulling it off.

Looking back on the conference, both Daniel and myself gained a lot:

  • A first experience of an OpenSpace conference. Going in, I liked the theory of OpenSpace a lot, and in practice I have to say that despite some challenges it is a great format. Sessions are a lot more interactive, and you can get more out of a session by putting more in yourself. Of course, not all sessions work out as you may have hoped. From my observation, smaller sessions on more focused topics are more likely to succeed. A larger session on a more general topic needs an experienced facilitator. The one topic I proposed was more of an all-in discussion, so I didn’t really feel the need to do much “facilitation”.
  • Of course, we got plenty out of the sessions themselves. My personal favourite was a small session about continuous integration with distributed SCMs. This is an interest of mine that I hope to follow up soon. We also got great insights into the problems that our customers (and potential customers ;) ) face. Daniel made the observation that people reported many problems just managing their builds over time, even before getting to automated testing and continuous integration. This is an area that is of obvious importance to us as you need a good build to apply Pulse most effectively.
  • We got to meet a bunch of people from various software backgrounds all interested in continuous integration. It was great to see such a turn out and enthusiasm for this area, and really reinforces that this is a boom time for improving software build and test practices. As noted by many attendees, having food put on right at the venue really helped keep discussions going. The times between sessions were just as valuable as the sessions themselves.
  • We got to meet some more of our competitors, to whom we can naturally relate. There is a great spirit between the competitors I have met in our field. It’s great to be able to have a chat about the common problems we all face, and the opportunities in the future.
  • Beer. And curry. ‘Nuff said.

And all this was free. If only there were more conferences organised in this spirit.