a little madness

A man needs a little madness, or else he never dares cut the rope and be free -Nikos Kazantzakis

Zutubi

Archive for the ‘SCM’ Category

Pulse Continuous Integration Server 2.5 Alpha!

Cobwebs may be creeping over the blog, but not over Pulse: we’ve just published the first Pulse 2.5 alpha build! The Pulse 2.5 series is focused (even more so than usual) on customer feedback. We’ve got plenty of great ideas from the Pulse community, and we’re folding in some of the best ones in this release series. Updates so far include:

  • Browse and dashboard view filtering (by project health).
  • A new MSTest post-processor plugin.
  • Increased visibility of build (and other) comments.
  • Comments on agents, to communicate with other users.
  • Properties on agents, for simpler parameterisation.
  • The ability to configure SCMs using resources.
  • A resource repository on the Pulse master.
  • A new option to attach build logs to notification emails.
  • Pre-stage hooks, which run just before the stage is dispatched to the agent.
  • SCM inclusion filters, complementing existing exclusion filters.
  • Inverse resource requirements (i.e. requiring the absence of a resource).
  • Smart label renaming.
  • A separate permission for the ability to clean up build directories.
  • Support for Perforce streams.

There’s plenty more to come in subsequent alpha builds, too. In fact, we’ve started the groundwork for some other larger changes already in 2.5.0.

To grab yourself a fresh-from-the-oven 2.5 alpha build, or just to find out more, head over to our alpha program page.

Pulse Sample and Community Plugins

Since version 2.0, Pulse has supported plugins to extend integration with external tools. Whenever customers have asked for help with plugin implementation, we’ve always provided samples as the easiest starting point. Seeing a complete plugin project, with build and packaging support, is the easiest way to get a new plugin up and running.

Based on this we’ve decided to start maintaining a repository of open source Pulse plugins. These will act both as samples and in some cases as early versions of plugins that may migrate into the core Pulse distribution. The sample plugins are hosted in a Mercurial repository which has an hgweb interface:

http://hg.zutubi.com/

So far we have provided a sample JUnit test post-processor (simplified from the version shipped with Pulse), and the beginnings of a Subversion SCM plugin (wrapping the svn command line). Provided you have Mercurial installed, you can easily clone the samples over HTTP, e.g.:

$ hg clone http://hg.zutubi.com/com.zutubi.pulse.core.postprocessors.junits

The samples have README files to get you started.

You may notice there is a third sample: a plugin that provides basic SCM integration with Bazaar by wrapping the bzr command-line. I’m happy to report that this sample was generously donated by a member of the community, Michiel van Slobbe. Michiel is the first to try to get Pulse and Bazaar working together and we appreciate the effort! Hopefully it will provide inspiration for other community members.

Although we are happy to provide hosting for such community plugins, you may also choose your own path. Another member of the community, Orvid, is using Pulse for C#/Mono projects. Orvid has taken the initiative of writing a command plugin to integrate XBuild, the Mono equivalent of Microsoft’s MSBuild, with Pulse. You can find the XBuild plugin on GitHub:

https://github.com/Orvid/XBuildIntegration

You might also notice that Orvid has been working on a .Net wrapper for the Pulse remote API:

https://github.com/Orvid/Zutubi.Pulse.Api

These efforts are a great contribution that we are most thankful for!

Refactoring vs Merging

Do The Simplest Thing That Could Possibly Work, then Refactor Mercilessly: two well known Extreme Rules. It’s hard to argue with the virtues of refactoring, but if you’ve ever had to manage parallel codelines then you might have a different perspective on “mercilessly”. Many types of refactoring are at odds with parallel codelines because they make merging – an already difficult task – more difficult. In particular, changes that physically move content around amongst files tend to confuse SCMs which take a file-based view of the world (i.e. the majority of them). How can we alleviate this tension? A few possibilities come to mind:

  1. Don’t do parallel development: if you don’t have to merge, you don’t have a problem. Since the act of merging itself is pure overhead to the development process, this is a tempting idea. However, reality (typically of the business kind), dictates that parallel development is necessary to some degree. How do you support customers on older versions? How do you take on large and risky subprojects without derailing the development trunk? These are scenarios where branching is valuable enough to justify the overhead of merging. It is reasonable therefore to try and minimise parallel development, but rarely possible to avoid it completely.
  2. Only refactor on the trunk (or whereever the majority of development is done). This helps to alleviate the problem to some degree, by containing the main divergence to a single codeline. It is also usually the codeline where you would naturally do most of your refactoring, as it is the bleeding edge. However, even merging a small bug fix from a maintenance branch may prove difficult due to code movements on the trunk. And this solution is not much help for active development branches that run parallel to the trunk for some time.
  3. Another Extreme Rule: Integrate Often. It is no coincidence that Perforce-speak for “merge” is “integrate”. Merging is a type of integration, and like any it is less painful if done regularly, before codelines have diverged too far. This is, however, a way to mitigate problems rather than solve them.
  4. Avoid problematic refactoring, such as moving code between files. In some cases it is clear that a certain refactoring is best avoided due to anticipated merging. However, avoiding all problematic refactoring is not a workable long-term strategy. At some point, the maintenance benefits of cleaning up the code will outweigh the merging cost.

In reality we use a combination of these ideas and a dose of common sense to reduce merging problems. However, the tension still exists and can be painful. A technology-base solution to this would of course be ideal, i.e. an SCM that understood refactoring and could take account of it when merging. Unfortunately I know of no such existing SCM, and there are significant barriers to creating one:

  1. Many popular SCMs of today struggle enough with the file-based model that it is hard to see them moving on to such a high level.
  2. File-based, text-diff change management applies genericly to a wide range of tasks. Any tool with deeper knowledge of actual file contents would likely trade off some of this genericity.
  3. Any “smart” merging algorithm will still have many corner cases where the right choice is ambiguous, and making these comprehensible to the poor human that needs to resolve them is more difficult when the algorithm is more complicated.

On a more positive note, there has been some welcome innovation in the SCM space recently. Perhaps I have overlooked solutions that exist now, or perhaps the growing competition will spur innovation in this area. Either way, if you have any ideas, let me know!

Finally: Merge Tracking in Subversion

If you’re going to build a “compelling replacement for CVS”, then you need to do two key things:

  1. Make commits atomic, with proper changesets
  2. Handle merging properly (e.g. support repeated, bi-directional and cherry-picked merges)

Sure, there are other things to improve on, and there is further room for innovation, but these are the two most fundamental features that CVS lacks. This is why I mentioned in my last post that I have been disappointed with the progress of Subversion. I am a Subversion user, and do prefer it to CVS. However, consider that the Subversion project was started in June 2000, and version 1.0 was released in February 2004. Proper merging support is long, long overdue. Thankfully, though, it is finally coming:

http://merge-tracking.open.collab.net/

Current estimates target a 1.5 release with merge tracking in “late summer”, which I guess means August (people need to remember that there are two hemispheres…). I do hope that the new merging support is worth the wait.

Linus Makes a git of Himself

Recently, Linus gave a Google tech talk on git, the SCM that he wrote to manage Linux kernel development after the falling out with BitKeeper. In the talk, Linus lambasts both CVS and Subversion, for a variety of reasons, and argues why he believes that git is far superior. In true Linus fashion, he frequently uses hyperbole and false(?) egotism as he lays out his arguments. This in itself could all be in good fun, but Linus takes it a too far. The even worse part, in my opinion, is that Linus confidently pedals arguments that are either specious or poorly focused. In the end, he just loses a lot of respect.

Now both CVS and Subversion are far from perfect. In fact, I personally dislike CVS and am disappointed with the progress of Subversion. Git clearly includes some advanced features that empower the user when compared with either of these tools. However, the way Linus presents the superiority of git is very misleading. Firstly, he throws out several insults towards CVS/Subversion that are plain wrong, such as:

  1. Tarballs/patches (the way kernel development was managed way, way back) is superior to using CVS. This almost has to be a joke, but sadly I do not think it was meant that way. CVS may be lacking many powerful features, but this statement ignores the many extremely useful features that CVS does have.
  2. There is no way to do CVS “right”. Only if you buy the argument that decentralised is the only way, which Linus does not convince me of one bit.
  3. It is not possible to do repeated merges in CVS/Subversion. It may suck that the tools do not support this directly, but everyone knows how to do repeated merges with a small amount of extra effort on the user’s part. I dislike the extra effort, but this is very different from it being impossible.
  4. To merge in CVS you end up preparing for a week and setting aside a whole day. Only a stupid process would ever lead to such a ridiculous situation; one that is not adapted to the tool being used. A more accurate criticism would be to say that CVS limits your ability to parallelise development due to weak merging capabilities.

Ignoring the insults, the very core of Linus’ arguments does not make sense. He claims the absolute superiority of the decentralised model, but the key advantages that he uses to back this up are not unique to decentralised systems. Almost all of the key advantages are much more about the ability to handle complex branching and merging than they are about decentralisation. The live audience was not fooled, and one member even questioned Linus on pretty much this exact point a bit over half way through the talk. In response, Linus starts off on a complete tangent about geographically distributed teams and network latency, leaving the question totally unanswered. The closest he gets is making some very weak points about namespace issues with branches in centralised systems (like somehow that would be difficult to solve).

The inability to answer (or perhaps the ability to avoid) important questions is in fact a recurring theme throughout the talk. Another prime example is when an audience member asks an insightful question about whose responsibility it is to merge changes. In a centralised system, the author is naturally tasked with merging their own new work into the shared code base. The advantage here is they know their own changes and are thus well equipped to merge them. In a decentralised system, people pull changes from others and thus end up merging the new work of others into their own repository. Linus is so happy with this question he quips that he had payed to be asked it. Such a shame, then, that he fails to answer it in any meaningful way. Instead, he waffles about the powerful merging capabilities of git (again, not unique to distributed systems). He then hypothesises a case where he pulls a conflict and instead of merging the change himself he requests the author of the change to do the merging for him. He concludes proudly that this is so “natural” in the distributed model. But hang on a second, the original point was that this would have actually been more natural in the centralised model: the author would have merged their own change and Linus need never have been bothered. Honestly, this is as close as you can get to stand up comedy when giving a talk about an SCM!

All in all, a very disappointing talk. Distributed SCMs are a great thing, and bring many advantages. The powerful merging that the distributed model has brought (through necessity) is something that other SCM vendors should be taking a good look at. This is no way to promote the distributed model, however. A more thoughtful view on the pros and cons of distributed versus centralised models, including what each can take from the other, would have been much more informative. My feel is that centralisation has many benefits for certain projects, and like most other things, the best solution would adapt the good parts of each model to the project at hand.