a little madness

A man needs a little madness, or else he never dares cut the rope and be free -Nikos Kazantzakis

Zutubi

Archive for May, 2006

Wallet friendly RSS feeds

I have been subscribing to RSS feeds for some time now, but it was not until I had to implement one that I realised that there was more to it than just a structured XML response.

Below is a quick dissection of what I would consider the absolute minimum implementation for an RSS feed. I’m calling it “wallet friendly” not because it makes you money, but because it will save your users in bandwidth costs by not spitting out a full feed for every request. Whilst the example is in Java, using Rome and Webwork, the details apply equally well to other frameworks and languages.

This example is an extension of the WebWorkResultSupport class.

    
protected void doExecute(String format, ActionInvocation action....         
{                  
    HttpServletResponse resp = ServletActionContext.getResponse();
    HttpServletRequest req = ServletActionContext.getRequest();
    OgnlValueStack stack = actionInvocation.getStack();  


Firstly, what do we do if there is no feed to display? The answer will depend on what that means in your system. If it’s an error, then it’s time for a 500 (Internal Server Error) response. If, on the other hand, it means that the feed request is invalid, it’s best to return a 410 (GONE) so that whoever is making the request knows that they should stop.

         
 
    SyndFeed feed = (SyndFeed) stack.findValue("feed");
    if (feed == null)
    {
        resp.sendError(HttpServletResponse.SC_GONE);
        return;
    }


Now set the content type and disposition. This will help the browser to deal with the response appropriately, and give it a friendly name if the user decides to ‘save as’.



    resp.setContentType("application/rss+xml; charset=UTF-8");
    resp.setHeader("Content-Disposition", "filename=rss.xml");


When handling an RSS request, you should not return the feed unless it has changed since it was last requested. You can find a very good discussion on this by Charles in his fishbowl, and by Randy at the kbcafe. (For those in a hurry, the relevant details are that the “Last-Modified” and “ETag” headers are returned in the following request as “If-Modified-Since” and “If-None-Match” respectively)

There are two steps to this. The first is to always set the “Etag” and “Last-Modified” response headers. The “Last-Modified” details can be taken from the feed as so:



    // A happy default here is the If-Modified-Since header. 
    // If we don't have any feed entries, then this will result 
    // in a 304 Not modified
    Date lastModified = 
            new Date(request.getDateHeader("If-Modified-Since"));
    List entries = feed.getEntries();
    if (entries.size() > 0)
    {
        // Get the latest feed entry - assuming the latest is 
        // at the top and that you set a published/updated 
        // date on the feed entries.
        SyndEntry entry = (SyndEntry) entries.get(0);
        lastModified = entry.getPublishedDate();
        Date updated = entry.getUpdatedDate();
        if (updated != null && lastModified.compareTo(updated) < 0)
        {
            lastModified = updated;
        }
    }


The Etag should uniquely identify this feed (read - the latest item in this feed). Unless you expect feed entries to be created at exactly the same time (or your database does not provide a high degree of accuracy in the time field) it's sufficient to use the last modified timestamp for the Etag. If this is not unique enough in your case, you will need to create a unique hash from the content.



    String etag = Long.toString(lastModified.getTime());
    resp.setHeader("ETag", etag);


Before you can use the last modified date for the header, you may need to drop the milliseconds since they are not part of the date format used by HTTP.



    Calendar cal = Calendar.getInstance();
    cal.setTime(lastModified);
    cal.set(Calendar.MILLISECOND, 0);
    lastModified = cal.getTime();


Now we can set the "Last-Modified" header.



    // always set
    resp.setDateHeader("Last-Modified", lastModified.getTime());


That completes the first step (setting the "Last-Modified" and "Etag" headers on every response). The second step is to check the "If-None-Match" and "If-Modified-Since" on the request (remembering that they 'should' contain what you sent our in the previous response). If they match the "ETag" and "Last-Modified" values we just set on the response then we do not need to return the feed. A 304 Not Modified will suffice.



    // Check the headers to determine whether or not a response 
    // is required.
    if (TextUtils.stringSet(req.getHeader("If-None-Match")) ||
    TextUtils.stringSet(req.getHeader("If-Modified-Since")))
    {
        if (etag.equals(req.getHeader("If-None-Match")) &&
            lastModified.getTime() == 
                         req.getDateHeader("If-Modified-Since"))
        {
            // If response is not required, send 304 Not modified.
            resp.sendError(HttpServletResponse.SC_NOT_MODIFIED);
            return;
        }
    }


Now, let's generate the feed data and send it on its way.



    // Render the feed in the requested format.
    WireFeed outFeed = feed.createWireFeed(format);
    outFeed.setEncoding("UTF-8");
    new WireFeedOutput().output(outFeed, response.getWriter());
    resp.flushBuffer();


Oh, and one last thing. If you want to return error details when things go wrong, do the RSS reader a favour and format the error response in valid RSS format. But be sure to set appropriate Last Modified and Etag header. For example, set an error token in the ETag header that you can check next time round. If your current error token matches the ETag in the request, respond with the 304.

--------
Into continuous integration? Want to be? Try pulse.

10 Ways To Improve Wikis; 2

Sorry readers, I know you have all been waiting with bated breath for the final installment in the wiki series :). Well, here goes:

  1. Unreadable syntax for tables

    The problem is actually more generic: the simple wiki syntax works well for inline elements, but poorly for structured elements. Why? Mainly because whitespace is significant. This is convenient most of the time, but gives no way to format a structured entity such that it can be easily read. A possible solution is to allow insignificant whitespace so that structured elements can be formatted with indentation. Naturally, for regular paragraphs, significant whitespace would have to stay. I think the semantics for whitespace can still be clear if the insignificant whitespace is only allowed “outside” of the actual content, in the space between whatever syntax elements are used to delimit the table (or whatever it happens to be). Or you could just use a WYSIWYG editor, but you won’t catch me doing it :).

  2. Editing in a browser text area sucks

    A couple of readers actually pointed out some neat solutions for this one, such as the mozex extension for Firefox, which lets you use an external program for editing instead of the browser text area. Nice idea, but still a little awkward, as you can’t avoid interacting with the web UI for other actions (this is a wiki after all!). I think the real answer is the slow, painful progression towards browsers as a richer application platform. Many have tried to create the ubiquitous thin client technology for the web, but none can compete with the inertia of the trusty old web browser. Unfortunately for us users, this means we’ll be waiting a long time for browser technology to evolve to a state where web UIs are actually enjoyable to use. Just look at how excited we all get when AJAX lets us emulate two-decade-old desktop technology in our web apps.

  3. Poor support for versioning

    Let’s start simple: a wiki that doesn’t have at least RCS-level versioning support (simple revisions, diffs, history) should not even be allowed to exist. A wiki that doesn’t have some form of merging support (with conflict resolution) isn’t worth using. Personally, my requirements are even more strict. I maintain documentation for software products, and these products have multiple release streams. I manage these streams in my code base using branches, and ideally would do the same with the documentation. I don’t know of a wiki that supports this. This seems a shame, because the version control technology has been around for a long time. Rather than reinventing a weaker versioning scheme, I don’t understand why more wikis don’t use a full-blown version control system (like Subversion) in the back end. There is a wealth of existing tools and technologies that just aren’t being leveraged here.

  4. Lost edits due to browser crashes

    First off, for those wikis with autosave, this is much less of a problem. I would like to mention one insightful comment from reddit, however. There, poster rahul turned this problem back at the browser itself. A good point: why don’t the web browsers implement their own autosave for form data? They are already part of the way there by remembering form input, it is not much of a leap to get from there to autosaving for text areas. The obvious benefit of this approach is that once it is implemented in the browser, there is no longer a need for every web app to reimplement the same functionality.

  5. Wiki discussions

    People, please just stop using flat wiki pages for discussions! It hurts us, precious! Some wikis have a more enlightened approach, and allow threaded comments on each page. This is a huge improvement, no doubt. Still, will they ever match dedicated forum solutions? Maybe wiki implementors should be looking again to adopt existing technologies. I wonder how feasible it would be to integrate an existing forum solution into the wiki. Perhaps a looser coupling would even be benefitial, allowing more diverse tools to be linked to a wiki page. The counterbalancing force is of course the benefit of tight integration.

So there you have it: some good can come of a rant after all. At the very least, I have picked up a few new ideas myself!

——–
Into continuous integration? Want to be? Try pulse.

Pulse CI Server 1.0.1 Released!

Things have been a bit quiet around this blog the past week as we launched both our new website zutubi.com and the first public beta of pulse. Pulse is an automated build (or continuous integration) server built on the principles that are important to us:

  • Adaptability to existing environments: everything within pulse is built on a generic core engine that can run any command line build and extract useful information from the results. When we add specific tool support, we build it upon this core, ensuring the core is flexible enough to deal with many tools.
  • Developer control: pulse gives every developer their own account. Each developer has a configurable dashboard that shows the information they want to see. Developers can also control how and when they are notified of build events.
  • Simple configuration: pulse has a full web interface that allows you to configure the server in minutes. You can build a project without editing a single text file. Alternatively, you can choose to configure your project using a simple XML file and version it in your SCM with your code.

That’s enough marketing here :). If you are interested, head over to the website for full details. Give pulse a free trial for 30 days and let us know what you think. Sign up for the beta program and get discounts on commercial licenses!