a little madness

A man needs a little madness, or else he never dares cut the rope and be free -Nikos Kazantzakis

Zutubi

Refactoring vs Merging

Do The Simplest Thing That Could Possibly Work, then Refactor Mercilessly: two well known Extreme Rules. It’s hard to argue with the virtues of refactoring, but if you’ve ever had to manage parallel codelines then you might have a different perspective on “mercilessly”. Many types of refactoring are at odds with parallel codelines because they make merging – an already difficult task – more difficult. In particular, changes that physically move content around amongst files tend to confuse SCMs which take a file-based view of the world (i.e. the majority of them). How can we alleviate this tension? A few possibilities come to mind:

  1. Don’t do parallel development: if you don’t have to merge, you don’t have a problem. Since the act of merging itself is pure overhead to the development process, this is a tempting idea. However, reality (typically of the business kind), dictates that parallel development is necessary to some degree. How do you support customers on older versions? How do you take on large and risky subprojects without derailing the development trunk? These are scenarios where branching is valuable enough to justify the overhead of merging. It is reasonable therefore to try and minimise parallel development, but rarely possible to avoid it completely.
  2. Only refactor on the trunk (or whereever the majority of development is done). This helps to alleviate the problem to some degree, by containing the main divergence to a single codeline. It is also usually the codeline where you would naturally do most of your refactoring, as it is the bleeding edge. However, even merging a small bug fix from a maintenance branch may prove difficult due to code movements on the trunk. And this solution is not much help for active development branches that run parallel to the trunk for some time.
  3. Another Extreme Rule: Integrate Often. It is no coincidence that Perforce-speak for “merge” is “integrate”. Merging is a type of integration, and like any it is less painful if done regularly, before codelines have diverged too far. This is, however, a way to mitigate problems rather than solve them.
  4. Avoid problematic refactoring, such as moving code between files. In some cases it is clear that a certain refactoring is best avoided due to anticipated merging. However, avoiding all problematic refactoring is not a workable long-term strategy. At some point, the maintenance benefits of cleaning up the code will outweigh the merging cost.

In reality we use a combination of these ideas and a dose of common sense to reduce merging problems. However, the tension still exists and can be painful. A technology-base solution to this would of course be ideal, i.e. an SCM that understood refactoring and could take account of it when merging. Unfortunately I know of no such existing SCM, and there are significant barriers to creating one:

  1. Many popular SCMs of today struggle enough with the file-based model that it is hard to see them moving on to such a high level.
  2. File-based, text-diff change management applies genericly to a wide range of tasks. Any tool with deeper knowledge of actual file contents would likely trade off some of this genericity.
  3. Any “smart” merging algorithm will still have many corner cases where the right choice is ambiguous, and making these comprehensible to the poor human that needs to resolve them is more difficult when the algorithm is more complicated.

On a more positive note, there has been some welcome innovation in the SCM space recently. Perhaps I have overlooked solutions that exist now, or perhaps the growing competition will spur innovation in this area. Either way, if you have any ideas, let me know!

Liked this post? Share it!

Leave a Reply