Archive for the ‘Java’ Category

Your Own Little Language

Friday, June 2nd, 2006

One of the most studied and best understood areas of computer science is lexical analysis and parsing. Fantastic tools (such as ANTLR) are also available to automatically generate parsers based on declarative input. Despite this, most developers rarely take the opportunity to design their own “little” languages. No, I’m not suggesting we should all start creating full-blown programming languages (although that might be fun…). I’m not talking about Domain Specific Languages (DSLs) either1, at least not in the strictest sense. When I say “little” I mean a language custom built to take care of a small task.

For example, in pulse you can filter build notifications using arbitrary boolean expressions. To allow this, we created a custom boolean expression language. This “little” language has the usual boolean operators (and, or, not) and primitives like “success” which evaluates to true if the build was successful. It was implemented in an afternoon, thanks to ANTLR writing most of the code for us. The result is a simple yet powerful way to specify filters, much more usable than a GUI of equal flexibility. To keep the simple cases simple we provide a GUI with pre-canned expressions for the most common scenarios. It’s the best of both worlds.

So keep your eyes open, there are opportunities for little languages everywhere. With a tool like ANTLR in your arsenal to do all the heavy lifting, there is no reason not to give little languages a go! To give you a head start, I’ll be presenting an “ANTLR By Example” series of posts that will show exactly how we used ANTLR to create the language described above. Stay tuned!


1 on second thought, that would make this post trendier…

Wallet friendly RSS feeds

Thursday, May 4th, 2006

I have been subscribing to RSS feeds for some time now, but it was not until I had to implement one that I realised that there was more to it than just a structured XML response.

Below is a quick dissection of what I would consider the absolute minimum implementation for an RSS feed. I’m calling it “wallet friendly” not because it makes you money, but because it will save your users in bandwidth costs by not spitting out a full feed for every request. Whilst the example is in Java, using Rome and Webwork, the details apply equally well to other frameworks and languages.

This example is an extension of the WebWorkResultSupport class.


protected void doExecute(String format, ActionInvocation action....
{
    HttpServletResponse resp = ServletActionContext.getResponse();
    HttpServletRequest req = ServletActionContext.getRequest();
    OgnlValueStack stack = actionInvocation.getStack();  


Firstly, what do we do if there is no feed to display? The answer will depend on what that means in your system. If it’s an error, then it’s time for a 500 (Internal Server Error) response. If, on the other hand, it means that the feed request is invalid, it’s best to return a 410 (GONE) so that whoever is making the request knows that they should stop.

         

    SyndFeed feed = (SyndFeed) stack.findValue("feed");
    if (feed == null)
    {
        resp.sendError(HttpServletResponse.SC_GONE);
        return;
    }


Now set the content type and disposition. This will help the browser to deal with the response appropriately, and give it a friendly name if the user decides to ’save as’.



    resp.setContentType("application/rss+xml; charset=UTF-8");
    resp.setHeader("Content-Disposition", "filename=rss.xml");


When handling an RSS request, you should not return the feed unless it has changed since it was last requested. You can find a very good discussion on this by Charles in his fishbowl, and by Randy at the kbcafe. (For those in a hurry, the relevant details are that the “Last-Modified” and “ETag” headers are returned in the following request as “If-Modified-Since” and “If-None-Match” respectively)

There are two steps to this. The first is to always set the “Etag” and “Last-Modified” response headers. The “Last-Modified” details can be taken from the feed as so:



    // A happy default here is the If-Modified-Since header.
    // If we don't have any feed entries, then this will result
    // in a 304 Not modified
    Date lastModified =
            new Date(request.getDateHeader("If-Modified-Since"));
    List entries = feed.getEntries();
    if (entries.size() > 0)
    {
        // Get the latest feed entry - assuming the latest is
        // at the top and that you set a published/updated
        // date on the feed entries.
        SyndEntry entry = (SyndEntry) entries.get(0);
        lastModified = entry.getPublishedDate();
        Date updated = entry.getUpdatedDate();
        if (updated != null && lastModified.compareTo(updated) < 0)
        {
            lastModified = updated;
        }
    }


The Etag should uniquely identify this feed (read - the latest item in this feed). Unless you expect feed entries to be created at exactly the same time (or your database does not provide a high degree of accuracy in the time field) it’s sufficient to use the last modified timestamp for the Etag. If this is not unique enough in your case, you will need to create a unique hash from the content.



    String etag = Long.toString(lastModified.getTime());
    resp.setHeader("ETag", etag);


Before you can use the last modified date for the header, you may need to drop the milliseconds since they are not part of the date format used by HTTP.



    Calendar cal = Calendar.getInstance();
    cal.setTime(lastModified);
    cal.set(Calendar.MILLISECOND, 0);
    lastModified = cal.getTime();


Now we can set the “Last-Modified” header.



    // always set
    resp.setDateHeader("Last-Modified", lastModified.getTime());


That completes the first step (setting the “Last-Modified” and “Etag” headers on every response). The second step is to check the “If-None-Match” and “If-Modified-Since” on the request (remembering that they ’should’ contain what you sent our in the previous response). If they match the “ETag” and “Last-Modified” values we just set on the response then we do not need to return the feed. A 304 Not Modified will suffice.



    // Check the headers to determine whether or not a response
    // is required.
    if (TextUtils.stringSet(req.getHeader("If-None-Match")) ||
    TextUtils.stringSet(req.getHeader("If-Modified-Since")))
    {
        if (etag.equals(req.getHeader("If-None-Match")) &&
            lastModified.getTime() ==
                         req.getDateHeader("If-Modified-Since"))
        {
            // If response is not required, send 304 Not modified.
            resp.sendError(HttpServletResponse.SC_NOT_MODIFIED);
            return;
        }
    }


Now, let’s generate the feed data and send it on its way.



    // Render the feed in the requested format.
    WireFeed outFeed = feed.createWireFeed(format);
    outFeed.setEncoding("UTF-8");
    new WireFeedOutput().output(outFeed, response.getWriter());
    resp.flushBuffer();


Oh, and one last thing. If you want to return error details when things go wrong, do the RSS reader a favour and format the error response in valid RSS format. But be sure to set appropriate Last Modified and Etag header. For example, set an error token in the ETag header that you can check next time round. If your current error token matches the ETag in the request, respond with the 304.

——–
Into continuous integration? Want to be? Try pulse.

Ajax vs Caching

Wednesday, April 12th, 2006

Recently I’ve thrown some simple Ajax (well, Aj at least) into our app, to refresh content that changes frequently without refreshing the whole page. With a library like prototype, it really couldn’t be simpler, just throw in an Ajax.PeriodicalUpdater that pulls in a new HTML fragment and you’re set.

Almost.

Working with web UIs, we’ve all run into the standard browser caching problems. A meta tag or two:

<meta http-equiv=”Pragma” content=”no-cache”/>
<meta http-equiv=”Expires” content=”0″/>

solves most problems. When you start using Javascript to periodically modify the page, however, things get trickier. The first problem I ran into was with IE. Symptom: the periodical updates just didn’t work. Because these were just fragments of a page, they didn’t contain the meta tags above. IE was caching the HTML fragment rather than hitting the server again as desired. Solution: use the no-cache and expiration headers on the HTTP response itself. I implemented this in WebWork with a simple interceptor, keeping this detail out of my action classes and allowing it to be reused as necessary:

public class AjaxInterceptor extends AroundInterceptor
{
    protected void after(ActionInvocation actionInvocation, String string)
        throws Exception
    {
    }

    protected void before(ActionInvocation actionInvocation)
        throws Exception
    {
        HttpServletResponse response = ServletActionContext.getResponse();

        response.setHeader("Pragma", "no-cache");
        response.setDateHeader("Expires", 0);
    }
}

So we’re done right? Right? Wrong. This fixed the original IE problem, but after a while I noticed another major annoyance. I had committed a mortal web UI sin: the back button was broken.

If an updating page was left open long enough for a content refresh, navigating away from that page and then returning via the back button displayed the originally-loaded content, not the refreshed content. From a user’s perspective this is not only annoying, but downright confusing. The problem in this case was again caching: not caching of the HTML fragment, but caching of the whole page. “But wait!”, you ask, “That original page has the meta tags, right?”. Right! But the back button doesn’t care. In fact, the HTTP spec requires the back button not to care. When you hit that back button you get the originally-loaded content, from the browser’s cache, expiration rules be damned!

So what can we do? The answer is to stop the page being put in the cache in the first place. This can be done with a liberal sprinkling of Cache-Control headers, to cater for various browsers:

...
    protected void before(ActionInvocation actionInvocation)
        throws Exception
    {
        HttpServletResponse response = ServletActionContext.getResponse();

        response.setHeader("Pragma", "no-cache");
        response.setHeader("Cache-Control", "must-revalidate");
        response.setHeader("Cache-Control", "no-cache");
        response.setHeader("Cache-Control", "no-store");
        response.setDateHeader("Expires", 0);
    }
...

Firefox in particular is fussy, requiring “no-store”.

For more information on back button and caching behaviour in Firefox and IE, here’s a head start:

——–
Into continuous integration? Want to be? Try pulse.

Your IDE is too smart for your own good

Wednesday, April 12th, 2006

What do you do when your app explodes with a StackOverflowError after oh, say, a week of operation? Of course, you just follow the stack trace to get to the line that caused the explosion, and work from there. Nice theory, but what about when the end of your stack trace looks like:

at java.util.Collections$UnmodifiableCollection$1.(Collections.java:1007)
at java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1006)
at java.util.Collections$UnmodifiableCollection$1.(Collections.java:1007)
at java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1006)
at java.util.Collections$UnmodifiableCollection$1.(Collections.java:1007)
at java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1006)

? Last time I checked Collections weren’t runnable … so we’ve only got part of the trace, and the more worthless part at that. Well,at least it gives us some idea. With a little bit of mental gymnastics, and a bit of luck, you might guess that the temporal nature of the problem has something to do with that enterprise scheduler you’re using. Nice guess! Bump up the frequency of the schedule, throw a breakpoint into Collections$UnmodifiableCollection.iterator and voila, there’s your culprit!

But hang on a second, inspecting the state of your scheduler in your new-fangled-IDE’s (IDEA) debugger shows nothing out of the ordinary. The collection looks perfectly normal, it seems impossible that creating an iterator would explode the stack! That’s when (if you’re lucky) you realise that the IDE is helpfully interpreting the data structure for you. It sees an unmodifiable collection and says to itself “Aha! I know what that is! It’s just a regular collection wrapped in a bit of useless noise. Let me get that noise out of the way for you, sir.” Good in theory, but in this case that noise was just the problem I was trying to debug! IDEA had the same “Aha!” moment 4000-odd times and completely unwrapped a mess of nested unmodifiable collections, the kind of mess that gets you into trouble with the stack police.

Moral of the story: IDE magic is nice, but seeing the truth is nicer.

By the way, if you happen to be using Quartz, beware:

http://jira.opensymphony.com/browse/QUARTZ-399

——–
Into continuous integration? Want to be? Try pulse.