Archive for the ‘Technology’ Category

Bash Tip: Exit on Error

Wednesday, May 24th, 2006

Back in my post Your Next Programming Language I mentioned I would post occassional tips about bash scripting. As soon as I started writing my next script, it occured to me: the first thing I always do when writing a new bash script is set the errexit option:

set -e

This option makes your script bail out when it detects an error (a command exiting with a non-zero exit code). Without this option the script will plough on, and mayhem often ensues. In all the noise generated it can be a pain to found the root cause of the problem. So I make it a rule to set this option and fail as early as possible.


Into continuous integration? Want to be? Try pulse.

Continuous Integration: Not Just for Large Teams

Thursday, May 11th, 2006

Recently, a beta tester of pulse asked a fair question:

I can see the benefits of automated build software for large software teams. Would pulse considerably benefit a development team of 1-5?

Now, I am surely biased, but my answer is most certainly yes. Firstly, there are many benefits that apply to both, such as frequent testing, testing outside of the developers’ environments, early detection of integration problems, isolating changes that introduce problems, and so on.

It is also fair to say that some of the benefits for large teams do not apply to the same degree for a team of 1-5. For example, as the team grows the chance of an integration problem increases even more rapidly due to potentail semantic conflicts between changes committed by various developers. Even when the developers are all disciplined enough to update and run tests locally (which we all are, right? ;) ), there is still a greater chance of a submit race. For large teams the continuous integration server can also be a great way to communicate recent project activity, as developers are notified of builds, and the web UI may allow you to see recent project changes.

Smaller teams do not have such problems with submit races and typically find communication much easier. However, some of the benefits of continuous integration are more important to small teams than to large ones. A prime example is the increased frequency of testing. Without continuous integration, tests are only run as frequently as the developers run them. The fewer developers, the less frequent the testing. Problems take longer to show up, especially intermittent bugs (those nasty blighters that only show up once every 364 test runs). If your tests run every fifteen minutes on your continuous integration server, even those intermittent bugs can’t hide for long. Further, small teams necessarily have a smaller number of environments that they are naturally testing on during development. Adding another one, even a single continuous integration server, is significant. Lastly, a smaller team, with naturally limited resources, needs to apply automation even more aggresively than a large team. Dedicating one or more engineers to release and integration management may not seem like a big deal to a large team, but for a small team that may be a third of the manpower!

Although they benefit in different ways, I wouldn’t say a large team needs CI more than a small one, or vice-versa. Experience has taught me that once a team employs continuous integration, whether it be 2 or 200 strong, the engineers will never want to do without it again.

——-
Into continuous integration? Want to be? Try pulse.

Your Next Programming Language

Tuesday, May 9th, 2006

Many people talk about how, as software developers, we should learn new programming languages frequently. I couldn’t agree more: the broader perspective improves your skills and opens your eyes to the dark corners of the language you are currently using. It strikes me, however, that many developers are missing out on a class of languages that are extremely useful every day. People learn high-level languages like Java and C++, and often a scripting language or two like Perl or Python. Maybe they will even dabble in a functional language to get a really different take on the world. But for me, the single programming language I use most frequently day-to-day, alongside my primary language, is bash scripting. Yep, plain old hackish shell scripts.

Why? Because like most programmers, I’m lazy. I don’t like to do anything I can make a computer do for me, and there are a whole raft of such things that are easily achieved via a shell script. Often it will just be a one-liner to perform a batch operation on a bunch of files. A find/exec/sed sure beats the pants off changing 200 files by hand, and is even quicker than writing a Perl script. Shell scripting is also a boon for project automation. Is packaging your project a headache? Need to pull in a bunch of resources, munge a few files, run some tests and squeeze it all together? Build tools such as Ant or make may get you part of the way, but they are not designed to write scripts. I often use a script to do all the gathering and munging, and call out to those scripts from my build file.

So, no excuses! Even those of you more inclined to the Windows way of life have easy access to bash (and other shells) via Cygwin. Get a taste and you won’t look back. There’s something quite gratifying about replacing an arduous, multi-step task with a script that you can run without breaking a sweat. You’ll never have to work again!

——-
Into continuous integration? Want to be? Try pulse.

Wallet friendly RSS feeds

Thursday, May 4th, 2006

I have been subscribing to RSS feeds for some time now, but it was not until I had to implement one that I realised that there was more to it than just a structured XML response.

Below is a quick dissection of what I would consider the absolute minimum implementation for an RSS feed. I’m calling it “wallet friendly” not because it makes you money, but because it will save your users in bandwidth costs by not spitting out a full feed for every request. Whilst the example is in Java, using Rome and Webwork, the details apply equally well to other frameworks and languages.

This example is an extension of the WebWorkResultSupport class.


protected void doExecute(String format, ActionInvocation action....
{
    HttpServletResponse resp = ServletActionContext.getResponse();
    HttpServletRequest req = ServletActionContext.getRequest();
    OgnlValueStack stack = actionInvocation.getStack();  

Firstly, what do we do if there is no feed to display? The answer will depend on what that means in your system. If it’s an error, then it’s time for a 500 (Internal Server Error) response. If, on the other hand, it means that the feed request is invalid, it’s best to return a 410 (GONE) so that whoever is making the request knows that they should stop.

         
    SyndFeed feed = (SyndFeed) stack.findValue("feed");
    if (feed == null)
    {
        resp.sendError(HttpServletResponse.SC_GONE);
        return;
    }

Now set the content type and disposition. This will help the browser to deal with the response appropriately, and give it a friendly name if the user decides to ’save as’.


    resp.setContentType("application/rss+xml; charset=UTF-8");
    resp.setHeader("Content-Disposition", "filename=rss.xml");

When handling an RSS request, you should not return the feed unless it has changed since it was last requested. You can find a very good discussion on this by Charles in his fishbowl, and by Randy at the kbcafe. (For those in a hurry, the relevant details are that the “Last-Modified” and “ETag” headers are returned in the following request as “If-Modified-Since” and “If-None-Match” respectively)

There are two steps to this. The first is to always set the “Etag” and “Last-Modified” response headers. The “Last-Modified” details can be taken from the feed as so:


    // A happy default here is the If-Modified-Since header.
    // If we don't have any feed entries, then this will result
    // in a 304 Not modified
    Date lastModified =
            new Date(request.getDateHeader("If-Modified-Since"));
    List entries = feed.getEntries();
    if (entries.size() > 0)
    {
        // Get the latest feed entry - assuming the latest is
        // at the top and that you set a published/updated
        // date on the feed entries.
        SyndEntry entry = (SyndEntry) entries.get(0);
        lastModified = entry.getPublishedDate();
        Date updated = entry.getUpdatedDate();
        if (updated != null && lastModified.compareTo(updated) < 0)
        {
            lastModified = updated;
        }
    }

The Etag should uniquely identify this feed (read – the latest item in this feed). Unless you expect feed entries to be created at exactly the same time (or your database does not provide a high degree of accuracy in the time field) it’s sufficient to use the last modified timestamp for the Etag. If this is not unique enough in your case, you will need to create a unique hash from the content.


    String etag = Long.toString(lastModified.getTime());
    resp.setHeader("ETag", etag);

Before you can use the last modified date for the header, you may need to drop the milliseconds since they are not part of the date format used by HTTP.


    Calendar cal = Calendar.getInstance();
    cal.setTime(lastModified);
    cal.set(Calendar.MILLISECOND, 0);
    lastModified = cal.getTime();

Now we can set the “Last-Modified” header.


    // always set
    resp.setDateHeader("Last-Modified", lastModified.getTime());

That completes the first step (setting the “Last-Modified” and “Etag” headers on every response). The second step is to check the “If-None-Match” and “If-Modified-Since” on the request (remembering that they ’should’ contain what you sent our in the previous response). If they match the “ETag” and “Last-Modified” values we just set on the response then we do not need to return the feed. A 304 Not Modified will suffice.


    // Check the headers to determine whether or not a response
    // is required.
    if (TextUtils.stringSet(req.getHeader("If-None-Match")) ||
    TextUtils.stringSet(req.getHeader("If-Modified-Since")))
    {
        if (etag.equals(req.getHeader("If-None-Match")) &&
            lastModified.getTime() ==
                         req.getDateHeader("If-Modified-Since"))
        {
            // If response is not required, send 304 Not modified.
            resp.sendError(HttpServletResponse.SC_NOT_MODIFIED);
            return;
        }
    }

Now, let’s generate the feed data and send it on its way.


    // Render the feed in the requested format.
    WireFeed outFeed = feed.createWireFeed(format);
    outFeed.setEncoding("UTF-8");
    new WireFeedOutput().output(outFeed, response.getWriter());
    resp.flushBuffer();

Oh, and one last thing. If you want to return error details when things go wrong, do the RSS reader a favour and format the error response in valid RSS format. But be sure to set appropriate Last Modified and Etag header. For example, set an error token in the ETag header that you can check next time round. If your current error token matches the ETag in the request, respond with the 304.

——-
Into continuous integration? Want to be? Try pulse.

10 Ways To Improve Wikis; 2

Wednesday, May 3rd, 2006

Sorry readers, I know you have all been waiting with bated breath for the final installment in the wiki series :) . Well, here goes:

  1. Unreadable syntax for tables

    The problem is actually more generic: the simple wiki syntax works well for inline elements, but poorly for structured elements. Why? Mainly because whitespace is significant. This is convenient most of the time, but gives no way to format a structured entity such that it can be easily read. A possible solution is to allow insignificant whitespace so that structured elements can be formatted with indentation. Naturally, for regular paragraphs, significant whitespace would have to stay. I think the semantics for whitespace can still be clear if the insignificant whitespace is only allowed “outside” of the actual content, in the space between whatever syntax elements are used to delimit the table (or whatever it happens to be). Or you could just use a WYSIWYG editor, but you won’t catch me doing it :) .

  2. Editing in a browser text area sucks

    A couple of readers actually pointed out some neat solutions for this one, such as the mozex extension for Firefox, which lets you use an external program for editing instead of the browser text area. Nice idea, but still a little awkward, as you can’t avoid interacting with the web UI for other actions (this is a wiki after all!). I think the real answer is the slow, painful progression towards browsers as a richer application platform. Many have tried to create the ubiquitous thin client technology for the web, but none can compete with the inertia of the trusty old web browser. Unfortunately for us users, this means we’ll be waiting a long time for browser technology to evolve to a state where web UIs are actually enjoyable to use. Just look at how excited we all get when AJAX lets us emulate two-decade-old desktop technology in our web apps.

  3. Poor support for versioning

    Let’s start simple: a wiki that doesn’t have at least RCS-level versioning support (simple revisions, diffs, history) should not even be allowed to exist. A wiki that doesn’t have some form of merging support (with conflict resolution) isn’t worth using. Personally, my requirements are even more strict. I maintain documentation for software products, and these products have multiple release streams. I manage these streams in my code base using branches, and ideally would do the same with the documentation. I don’t know of a wiki that supports this. This seems a shame, because the version control technology has been around for a long time. Rather than reinventing a weaker versioning scheme, I don’t understand why more wikis don’t use a full-blown version control system (like Subversion) in the back end. There is a wealth of existing tools and technologies that just aren’t being leveraged here.

  4. Lost edits due to browser crashes

    First off, for those wikis with autosave, this is much less of a problem. I would like to mention one insightful comment from reddit, however. There, poster rahul turned this problem back at the browser itself. A good point: why don’t the web browsers implement their own autosave for form data? They are already part of the way there by remembering form input, it is not much of a leap to get from there to autosaving for text areas. The obvious benefit of this approach is that once it is implemented in the browser, there is no longer a need for every web app to reimplement the same functionality.

  5. Wiki discussions

    People, please just stop using flat wiki pages for discussions! It hurts us, precious! Some wikis have a more enlightened approach, and allow threaded comments on each page. This is a huge improvement, no doubt. Still, will they ever match dedicated forum solutions? Maybe wiki implementors should be looking again to adopt existing technologies. I wonder how feasible it would be to integrate an existing forum solution into the wiki. Perhaps a looser coupling would even be benefitial, allowing more diverse tools to be linked to a wiki page. The counterbalancing force is of course the benefit of tight integration.

So there you have it: some good can come of a rant after all. At the very least, I have picked up a few new ideas myself!

——-
Into continuous integration? Want to be? Try pulse.

10 Ways To Improve Wikis; 1

Friday, April 21st, 2006

Wow, I’m quite surprised by the coverage received by my previous post: 10 Things I Hate About Wikis. I was just blowing off a bit of steam, but somehow I managed to spark some discussion and learn a thing or two. So, I thought: why not take this discussion and turn it into something constructive? Like I say in my previous post, my beef with Wikis is not with the idea, but with execution. Of course, execution can (and hopefully will) improve over time. So I’ll go through my 10 points with some clarifications, ideas and contributions from others. To keep things sane, I’ll split it over a couple of posts. Here goes:

  1. Wikis are the easiest way to create awful documentation.

    This is mostly a wiki maintainer issue. As many comments point out, it’s all about having discipline when creating and maintaining the wiki content. I won’t try to address all of the cultural issues involved in successful collaboration. My concern is what the wiki implementor can do to help. It is too easy for the implementor to throw their hands up and say it is out of their control, but this is only partly true. A wiki can encourage quality by making the easy way to do things also the right way. Given the constantly-evolving nature of a wiki, the implementation should also encourage frequent refactoring by making it easy to reorganise content. Features like moving/renaming pages are quite common, but not always convenient, especially when you have a lot of content to shuffle around. How about features like splitting/joining pages? Global search and replace? I’m sure people have loads of ideas.

  2. WikiWords

    No software should restrict its users when it is so unnecessary. Wiki implementors: get rid of the restrictions, it’s really not that hard. Then I won’t have to hear about why the title of the page I am reading is inaccurate due to technical restrictions.

  3. Every wiki has its own syntax.

    The only answer is to standardise the syntax somehow. Will this happen? I think it is likely a standard will appear. It is much harder to tell how widely it will be implemented. There are sure to be compatibility problems with the masses of content that already exists. I would like to think, for the sake of their users, that wiki implementors will consider it. If wikis are here to stay, and continue to spread, this will only become more important.

  4. Wikis mark the return of the content management dark ages.

    This point was perhaps too brief to convey what I was really getting at. The facilities offered by wikis to create headings, lists and so on are fine. The markup is indeed semantic, as has been pointed out. What I miss (possibly through ignorance, wiki implementors please correct me) is an easy way to create my own styles. Not just custom HTML fragments, but the ability to create font and paragraph styles using CSS. I’d love to see existing solutions, I just haven’t come across one yet. For example, I can imagine a wiki allowing me to define a class in CSS syntax, and then apply that class to any part of my content by wrapping in the appropriate syntax. There’s no reason why the built in syntax can’t just be shorthand for the application of pre-defined classes that I can also modify if I wish.

  5. Inexplicably poor navigation.

    I know wikis aren’t a traditional hierarchy, and nor should they be. The possibilities with a wiki are much greater than that, as linking is such a fundamental part of the system. The point about searching made in one comment is a good one, a powerful search is a must and a great way to find content. However, search only works when you know what you are looking for. Convenient navigation, on the other hand, allows you to discover related content. My problem is when useful navigation functionality is non-existent or hidden away. Of course, this varies from wiki to wiki. I think at the very least, I should be easily able to navigate to:

    • All ancestors of the current page
    • All children of the current page
    • All siblings of the current page
    • All pages that link to the current page

    When I say easily, I don’t mean navigating to another page which shows me the links. I want access to the links on the current page, although allowing them to be shown/hidden may be necessary to avoid clutter. In addition to this page-to-page discovery, wikis also need a convenient way to browse around the entire content. In my experience this functionality exists, although the UI could often be a lot more dynamic.

Stay tuned for part 2.

——-
Into continuous integration? Want to be? Try pulse.

10 Things I Hate About Wikis

Wednesday, April 19th, 2006

OK, so wikis are a great idea. I have “embraced the wiki” as a great communication tool, and there are many, many benefits. But still they manage to get to me. It’s almost always down to one thing: execution. Execution by the implementors of the wiki, and execution by the people creating the content. So here it is:

10 Things I Hate About Wikis

  1. Wikis are the easiest way to create awful documentation. Lowering the barrier to entry is good, but if I see another open source project throw up a wiki and think they now have documentation, I’m going to scream! Perhaps we should call it lowering the barrier to stupidity (arrogance mine ;) ).
  2. WikiWords. Not all wikis are affected by this blight, thankfully. But don’t you love those that are, especially the knots you tie yourself in to manufacture a WikiWord when you just want to use a single word!
  3. Every wiki has its own syntax. Sure, HTML is too verbose to be convenient for editing wikis – it ruins the whole idea. Unfortunately, however, this has led to a proliferation of custom wiki syntaxes, each with their own quirks. Hence, working with multiple wikis is a pain, and every wiki has its own learning curve.
  4. Wikis mark the return of the content management dark ages. Once upon a time, we formatted documents in our word processor with font sizes, bold text etc. Eventually we realised this was a Bad Idea, and styles were born. (OK, introduced. Back in your box now, Tex groupies. And make sure the troff monster stays in there.) These days, nobody in their right mind creates a significantly-sized document without using styles. Then there was the internet. Remember the early days? The <b> tags? Eventually we realise this was a Bad Idea, and stylesheets were born (easy there, troff monster). Now we are back where we started again. I hope there are no LISP programmers in the room, because they’re bound to mention they “told us so”…
  5. Inexplicably poor navigation. Come on, wiki implementors, this should be one of your strongest points! Too often I find myself deep in the bowels of a wiki without a sensible way to navigate around. Sure, some of this is down to the wiki author, but there are so many opportunities for convenient navigation that are missed, by either not allowing the navigation or by hiding it away somewhere.
  6. Could anything be harder to read than a table written in typical wiki syntax? This is where the simplicity of the syntax falls flat on its face. The syntax works for basic, inline elements, but start to create strutured data and you become lost in a sea of ascii art (and not the good kind).
  7. Editing in a browser text area sucks! Possibly the only thing that sucks more is the half-assed rich text editing facilities wikis sometimes offer. Unfortunately, we’re pretty stuck with this one. Maybe advances in web UIs will help…
  8. Poor support for versioning. One thing I always hated about creating documentation in word processors is the inability to track changes and merge documents (yes, I know Word has an “implementation” of this feature – if only it actually worked). On the face of it, wikis have both the opportunity (text-based format) and the motivation (strong chance of concurrent editing) to have strong versioning and merging support. However, most of them don’t. Wiki implementors: this is a (largely) solved problem! No excuses! :)
  9. Losing 30 minutes of typing because my browser crashed, or I closed the tab, or some other minor tragedy occured. Thank god wikis are starting to implement autosave! Dragging themselves just a bit out of the dark ages ;) .
  10. Wiki discussions, e.g. those found in the original wiki. OK, so wikis were a cool new idea. That doesn’t override the fact that forums and newsgroups already existed as a much better medium for online discussion!

Phew, that felt good. Now wiki lovers everywhere, I’m ready for you to tell me how I “just don’t get it”.

——-
Into continuous integration? Want to be? Try pulse.

Ajax vs Caching

Wednesday, April 12th, 2006

Recently I’ve thrown some simple Ajax (well, Aj at least) into our app, to refresh content that changes frequently without refreshing the whole page. With a library like prototype, it really couldn’t be simpler, just throw in an Ajax.PeriodicalUpdater that pulls in a new HTML fragment and you’re set.

Almost.

Working with web UIs, we’ve all run into the standard browser caching problems. A meta tag or two:

<meta http-equiv=”Pragma” content=”no-cache”/>
<meta http-equiv=”Expires” content=”0″/>

solves most problems. When you start using Javascript to periodically modify the page, however, things get trickier. The first problem I ran into was with IE. Symptom: the periodical updates just didn’t work. Because these were just fragments of a page, they didn’t contain the meta tags above. IE was caching the HTML fragment rather than hitting the server again as desired. Solution: use the no-cache and expiration headers on the HTTP response itself. I implemented this in WebWork with a simple interceptor, keeping this detail out of my action classes and allowing it to be reused as necessary:

public class AjaxInterceptor extends AroundInterceptor
{
    protected void after(ActionInvocation actionInvocation, String string)
        throws Exception
    {
    }
    protected void before(ActionInvocation actionInvocation)
        throws Exception
    {
        HttpServletResponse response = ServletActionContext.getResponse();
        response.setHeader("Pragma", "no-cache");
        response.setDateHeader("Expires", 0);
    }
}

So we’re done right? Right? Wrong. This fixed the original IE problem, but after a while I noticed another major annoyance. I had committed a mortal web UI sin: the back button was broken.

If an updating page was left open long enough for a content refresh, navigating away from that page and then returning via the back button displayed the originally-loaded content, not the refreshed content. From a user’s perspective this is not only annoying, but downright confusing. The problem in this case was again caching: not caching of the HTML fragment, but caching of the whole page. “But wait!”, you ask, “That original page has the meta tags, right?”. Right! But the back button doesn’t care. In fact, the HTTP spec requires the back button not to care. When you hit that back button you get the originally-loaded content, from the browser’s cache, expiration rules be damned!

So what can we do? The answer is to stop the page being put in the cache in the first place. This can be done with a liberal sprinkling of Cache-Control headers, to cater for various browsers:

...
    protected void before(ActionInvocation actionInvocation)
        throws Exception
    {
        HttpServletResponse response = ServletActionContext.getResponse();
        response.setHeader("Pragma", "no-cache");
        response.setHeader("Cache-Control", "must-revalidate");
        response.setHeader("Cache-Control", "no-cache");
        response.setHeader("Cache-Control", "no-store");
        response.setDateHeader("Expires", 0);
    }
...

Firefox in particular is fussy, requiring “no-store”.

For more information on back button and caching behaviour in Firefox and IE, here’s a head start:

——-
Into continuous integration? Want to be? Try pulse.