Archive for the ‘Technology’ Category

Article: The Road To Build Enlightenment

Wednesday, August 23rd, 2006

Each day, every developer on a software team interacts with the build system several times. Despite this, many teams underplay the importance of an effective build system. Is your build system working for you, or do you merely tolerate its existence? In this article, we will journey through build system requirements starting from the most basic through to those that make the whole development process more efficient.

Read the The Road To Build Enlightenment at zutubi.com.

Windows Scripts Are All Malicious

Tuesday, August 15th, 2006

Well, if they try to do anything useful anyway. Last week I was working on improvements to the Windows service installation for Pulse. The existing service installation was handled by a pretty simple batch file wrapping the service executable. However, this method had some major limitations, mostly related to location of a compatible JVM. I needed to expand the logic in the batch script considerably as part of the solution to these limitations.

That’s when I realised: logic and batch scripts don’t really go together. In fact, scripting and batch scripts don’t really go together. Even something as simple as reading the output of an executed command is bizarre in batch (there is an obscure for loop syntax to do it). Fed up with the archaic nature and limitations of batch files, I went looking for an alternative. I had heard of powershell (aka Monad), but of course it is not yet widely installed. So I turned to Windows Script Host, which has been around since Windows 98. I hadn’t used it before, but I discovered you could write scripts in JScript (Javascript-like) and easily access the filesystem and registry, so it seemed like a good fit.

In fact, apart from the pain of navigating MSDN documentation online, my foray into WSH started quite promisingly. Then, just as I was really starting to get somewhere, Norton Antivirus chimed in. Apparently, this script that I was running was trying to do something “malicious”. Whenever I tried to do something moderately useful in the script, like access file or registry information, Norton would warn me in no uncertain terms. Brilliant. No matter that I was running the script directly, just as I could run any executable that could do equally “malicious” things. I suppose Symantec doesn’t care about false positives; that might take effort. Instead, WSH has been rendered useless to software distributors, except for those willing to have their customers warned that their software is “malicious” by a virus scanner.

In the end I implemented the solution as a standalone C++ program, because at least that way I knew I could get it to work. It’s a sad state of affairs when I am forced to write a few hundred lines of C++ when a 50 line script should do. That’s what happens when a platform is completely devoid of useful scripting options.

Free Small Team Licenses for Pulse!

Tuesday, August 15th, 2006

We are pleased to announce the immediate availability of free Small Team liceses for the Pulse continuous integration server. Small Team licenses are fully-featured licenses for up to two users and two projects on a single server.

We decided to make these licenses available for a few reasons:

  • During our careers we’ve worked for small teams without the budget for the best tools. Although Pulse is inexpensive, we don’t want it to be out of the reach of these teams while they are just starting out.
  • We have had interest from current users regarding a cheaper license for home use. Is free cheap enough for you? ;)
  • We drive the development of Pulse via user feedback. More users means more feedback and a better product in the long term. Adding a new class of users to the mix will help all of our customers.

So, get your Small Team license today and enjoy using Pulse!

If Java Could Have Just One C++ Feature…

Thursday, August 3rd, 2006

I have been immersed in Java for a while now, but having worked in C++ for years before, there is one big thing I miss: destructors. Especially in a language with exceptions, destructors are a massive time and error saver for resource management.

Having garbage collection is nice and all, but the fact is that we deal a multitude of resources and need to collect them all. How do we do this in Java? The Hard Way: we need to know that streams, database connections etc need to be closed, and we need to explicitly close them:

FileInputStream f = new FileInputStream(“somefile”);
// Do some stuff.
f.close();

Of course, with exceptions it gets worse. We need to guarantee that the stream is closed even if an exception is thrown, leading to the oft-seen pattern:

FileInputStream f = null;
try
{
    // Do some stuff
}
finally
{
    if(f != null)
    {
        try
        {
            f.close();
        }
        catch(IOException e)
        {
            // Frankly, my dear…
        }
    }
}

The noise is just incredible. A common way to reduce the noise is to use a utility function to do the null check and close, but noise still remains. Repeating the same try/finally pattern everywhere is also mind-numbing, and it can be easily forgotten leading to incorrect code.

In C++, this problem is solved elegantly using the Resource Acquisition Is Initialisation (RAII) pattern. This pattern dictates that resources should be acquired in a constructor and disposed of in the corresponding destructor. Combined with the deterministic destruction semantics for objects placed on the stack, this pattern removes the need for manual cleanup and with it the possbility of mistakes:

{
    std::ifstream f(“somefile”);
    // Do some stuff
}

Where has all the cleanup gone? It is where it should be: in the destructor for std::ifstream. The destructor is called automatically when the object goes out of scope (even if the block is exited due to an uncaught exception). The ability to create value types and place them on the stack is a more general advantage of C++, but Java can close the gap with smarter compilers1.

Interestingly, C# comes in half way between Java and C++ on this matter. In C#, you can employ a using statement to ensure cleanup occurs:

using (TextReader r = File.OpenText(“log.txt”)) {
    // Do some stuff
}

In this case the resource type must implement System.IDisposable, and IDispose is guaranteed to be called on the object at the end of the using statement. The using statement in C# is pure syntactic sugar for the try/finally pattern we bash out in Java every day.

What’s the answer for Java?2 Well, something similar to using would be a good start, but I do feel like we should be able to do better. If we’re going to add sugar why not let us define our own with a full-blown macro system? Difficult yes, but perhaps easier than always playing catch up? An alternative is to try and retrofit destructors into the language3. It is possible to mix both garbage collection and destructors, as shown in C++/CLI4. However, I don’t see an elegant way to do so that improves upon what using brings. If you do, then let us all know!


1 it appears that Mustang already has some of the smarts such as escape analysis.
2 if you’re the one down the back who shouted “finalizers”: you can leave anytime you want as long as it’s now!
3 I said NOW!
4See also Herb Suttor’s excellent post on the topic Destructors vs. GC? Destructors + GC!.

Do the Simplest Thing That Will Knock Their Socks Off

Wednesday, August 2nd, 2006

It appears that I am not the only one wondering if the pragmatic keep it real approach to software development will result in programs that are missing that something extra, those features that you can not do without the moment you find them.

Kathy writes:

Our users will tell us where the pain is. Our users will drive incremental improvements. But the user community can’t do the revolutionary innovation for us. That’s up to us.

Dave takes a more direct approach when he writes:

It does seem, though, that agile teams will be less likely to either prioritize or implement some of the more subtle touches of the kind that the Wired article discusses. When forced to choose between the features “Send email”, and “Implement graceful date column resizing”, guess which one is likely to get short shrift.

An excellent example of this type of feature, a feature that you didn’t know you needed until you saw it, is iTunes smart shuffle. It “allows you to control how likely you are to hear multiple songs in a row by the same artist or from the same album”.

I don’t know how I managed without it.

I can not help but wonder whether this feature would have been added by the agile keep it real approach? It is not an essential feature, and I would not have noticed if it was not there. But the fact that it is makes me a very happy user.

Pulse Continuous Integration Server 1.1 Beta!

Tuesday, July 25th, 2006

Well, it’s finally here! The first public beta of pulse 1.1. This is a huge milestone for us, just take a look at what’s new to see what I mean. The biggest single feature is powerful distributed building, but as you’ll see the list is long!

Thanks so much to our private pre-release testers, your feedback has been invaluable. Now for the rest of you: bring it on!

Execute the Future

Tuesday, July 18th, 2006

No matter how much multithreaded code you write, it will never be simple. The added complexity of races between threads makes it much harder to predict the possible combinations of events in your program. Even worse, testing the combinations is difficult, requiring thread-aware test cases with explicit synchronisation points. Thus it is a pleasure to see a package like java.util.concurrent offering more than just synchronisation primitives, but also higher level abstractions1.

One of the simplest but most useful abstractions in the package is the Future interface. As stated in the JavaDoc, this interface:

abstracts the result of an asynchronous computation

By encapsulating an operation in a Future, you can execute it asychronously (typically using an Executor). The Future may be queried, cancelled, and most importantly can return the result of the computation. When asking for the result (via one of the get methods), you can also specify a timeout indicating how long you are willing to wait for the result to become available.

To take a concrete example, recently I had a problem with some networking code. The purpose of the code was to ping a bunch of machines to see which of them were online and responding. Unfortunately, when a machine is unavailable it can take some time for the ping to fail, making the whole process take several minutes. In my case, such a long ping time indicates that the machine is essentially unusable anyhow, so I would prefer to fail the ping after a reasonable delay, say 10 seconds. I achieved this timeout without a single line of sychronisation code by using the FutureTask implementation of the Future interface:

public class DefaultAgentManager
{
    private ExecutorService pingService =
        Executors.newCachedThreadPool();

    private class Pinger implements
        Callable<SlaveStatus>
    {
        private SlaveAgent agent;

        public Pinger(SlaveAgent agent)
        {
            this.agent = agent;
        }

        public SlaveStatus call() throws Exception
        {
            SlaveStatus status;

            try
            {
                agent.getSlaveService().ping();
            }
            catch (Exception e)
            {
                status = new SlaveStatus(Status.OFFLINE,
                    e.getMessage());
            }

            return status;
        }
    }

    void pingSlave(SlaveAgent agent)
    {
        FutureTask<SlaveStatus> future =
            new FutureTask<SlaveStatus>(new Pinger(agent));
       
        pingService.execute(future);

        SlaveStatus status;

        try
        {
            status = future.get(10, TimeUnit.SECONDS);
        }
        catch (TimeoutException e)
        {
            status = new SlaveStatus(Status.OFFLINE,
                “Agent ping timed out”);
        }
        catch(Exception e)
        {
            status = new SlaveStatus(Status.OFFLINE,
                e.getMessage());
        }

        agent.updateStatus(status);
    }
}

Although I have simplified the code for clarity2, the most error-prone part of the implementation all lives in java.util.concurrent, which is tried and tested. There was no need for me to carefully synchronise and test for races, I was able to just get on with the job.

This is just one small part of the functionality offered by Futures. Combining them with CompletionServices, for example, allows an even higher level of abstraction for dealing with multiple tasks. If you write multithreaded code, and haven’t looked at the java.util.concurrent package, then do yourself a favour and dive in. The JavaDoc comes complete with usage suggestions and example code snippets, making the package very approachable. Package designer Doug Lea has also authored multiple books on the topic.


1 moving out of the imperative world we get into even more interesting abstractions to help exploit concurrency, but that’s a topic all of its own.
2 the most significant omission is how to handle the tasks that did not complete before the timeout; in some applications they may be a resource burden, in others it may be safe to let them complete in their own time

ANTLR By Example: Part 5: Extra Credit

Tuesday, July 11th, 2006

Introduction

Over the past four parts, I have illustrated how to parse and evaluate boolean expressions using ANTLR. The grammar presented is in those parts is based on real code in pulse. Although it works as presented, there are a couple of items to polish up, one of which I have solved, and the other of which I have not yet been able to solve.

Error Reporting

As pulse allows users to enter their own boolean expressions (to configure when they receive build notifications), decent error reporting is paramount. The first step is to turn off ANTLR’s default error handling, so that the errors can be handled by pulse. This is done by setting the defaultErrorHandler option to false:

class NotifyConditionParser extends Parser;
options {
        buildAST=true;
        defaultErrorHandler=false;
}

With that done, the ANTLR-generated code will throw exceptions on errors. Let’s take a look at the sorts of errors that are generated by the grammar as it stands.

Case 1: Unrecognised word:

$ java NotifyConditionParserTest "changed or tuer"
Caught error: unexpected token: tuer

Case 2: Unrecognised character:

$ java NotifyConditionParserTest "6 and false"
Caught error: unexpected char: '6'

Case 3: Illegal expression structure

$ java NotifyConditionParserTest "state.change or or success"
Caught error: unexpected token: or

Case 4: Unbalanced parentheses

$ java NotifyConditionParserTest "failure or (changed and success"
Caught error: expecting RIGHT_PAREN, found 'null'

Most of these messages are not too bad, at least they are on the right track. Case 4 is certainly the worst of the lot, although the information is accurate it is not exactly user friendly. We’ll get back to that later. One big thing missing in all cases is location information. I figured that ANTLR must have a way to retrieve the information, and a little digging uncovered it. All of the above messages are generated using the getMessage method of the exceptions thrown by ANTLR. To get the line and column number information (which is indeed stored in the exception), you can use the toString method instead:

try
{
    …
}
catch (Exception e)
{
    System.err.println(“Caught error: “ + e.toString());
}

Trying case 1 again:

$ java NotifyConditionParserTest "changed or tuer"
Caught error: line 1:12: unexpected token: tuer

Much better! Now the user knows where the error occured. That leaves us with case 4, which is still a little on the cryptic side:

$ java NotifyConditionParserTest "failure or (changed and success"
Caught error: expecting RIGHT_PAREN, found 'null'

It would be nice if we could not expose the raw token names (e.g. RIGHT_PAREN) and also explicitly say we hit the end of the input (instead of “found ‘null’”). To fix the former problem, we can add paraphrase options to our lexer tokens. This allows us to specify a phrase describing the token which will be used in error messages instead of the token name. The options are applied in the grammar file as part of the lexer rules, for example:

RIGHT_PAREN
options {
    paraphrase = "a closing parenthesis ')'";
}
    : ')';

Applying the paraphrases improves the error message considerably:

$ java NotifyConditionParserTest "failure or (changed and success"
Caught error: line 1:32: expecting a closing parenthesis ')', found 'null'

Unfortunately, we still have the pesky “found ‘null’” to deal with. In this case, I haven’t yet found a simple way to customise the error message. Instead, it is handled as a special case. I found that in this case the exception being thrown was a MismatchedTokenException, with the text of the found token set to null. This allowed the specific case to be handled with a custom message:

catch(MismatchedTokenException mte)
{
    if(mte.token.getText() == null)
    {
        System.err.println(“Caught error: line “ +
            mte.getLine() + “:” +
            mte.getColumn() +
            “: end of input when expecting “ +
            NotifyConditionParser._tokenNames[mte.expecting]);
    }
    else
    {
        System.err.println(“Caught error: “ + mte.toString());
    }
}

This is far from an ideal solution, and I am still looking for a better alternative. However, the user experience is king, and this hack improves it:

$ java NotifyConditionParserTest "failure or (changed and success"
Caught error: line 1:32: end of input when expecting a closing parenthesis ')'

DRY Violation

Those paying close attention would have noticed a wrinkle in the final ANTLR grammar: a violation of the DRY (Don’t Repeat Yourself) principle. Specifically, both the parser and tree parser share a common rule, which is repeated verbatim in the grammar file:

condition
    : "true"
    | "false"
    | "success"
    | "failure"
    | "error"
    | "changed"
    | "changed.by.me"
    | "state.change"
    ;

Despite scouring the ANTLR documentation, I am yet to find a way around this. I even took a look at some of the example grammars on the ANTLR website, and noticed that they suffer from a similar problem. If anyone knows a way to reuse a rule, let me know! I would love to remove the duplication.

Wrap Up

Well, that just about does it. I hope this series of posts has piqued your interest in ANTLR and parsing, and maybe even helped you to solve some of your own problems. Now go forth and parse!

ANTLR By Example: Part 4: Tree Parsing

Thursday, July 6th, 2006

Introduction

By the end of Part 3: Parsing, we had a working parser that could understand our expression language. The parser turns an input string into an Abstract Syntax Tree (AST) representing the expression. The final piece of the puzzle is evaluating the expression for a given build result (remember, these expressions are used to determine when to send build notifications). To do this we will make use of an ANTLR tree parser.

NotifyCondition’s

Before we get into the details of tree parsers, let’s first define the end goal. Currently we have a way to generate an AST for an expression. What we need, however, is a way to evaluate the expression for a given build result. Approaching from this angle, we can use a self-evaluating data structure that takes a build result and returns true if a notification should be sent for this result. The NotifyCondition interface serves as a basis for this data structure:

public interface NotifyCondition
{
    public boolean satisfied(BuildResult result, User user);
}

This interface defines just one method, satisfied, which returns true iff a notification should be sent for the given build result. The additional user argument is required to evaluate the “changed.by.me” condition. Implementations of this interface mirror the primitive conditions in our expression language:

  • ChangedByMeNotifyCondition
  • ChangedNotifyCondition
  • ErrorNotifyCondition
  • FailureNotifyCondition
  • FalseNotifyCondition
  • StateChangeNotifyCondition
  • SuccessNotifyCondition
  • TrueNotifyCondition

In addition, compound expressions (i.e. those involving operators) are represented using two further implementations:

  • CompoundNotifyCondition: combines a list of child NotifyConditions either conjunctively (and) or disjunctively (or).
  • NotNotifyCondition: inverts a single child NotifyCondition

Using these implementations of the NotifyCondition interface, we can create the self-evaluating data structure we need to test build results. The missing link is getting from the AST to the right combination of NotifyConditions. To help create NotifyConditions based on expressions, a simple factory is used:

class NotifyConditionFactory
{
    …

    public NotifyCondition createCondition(String condition)
    {
        …
    }
}

The factory takes a primitive condition string (e.g. “state.change”) and returns a new instance of the corresponding condition (e.g. StateChangeNotifyCondition).

Tree Parsers

A tree parser is a special kind of ANTLR parser that operates on an AST, rather than a token stream. This seems strange at first, as trees are a two-dimensional structure, not a single-dimansional stream. However, the principles are much the same: tree parsers just have a special rule syntax for matching input in two dimensions. In our case, we use the tree parser as a simple, declarative way to transform our AST into our own data structure representing the expression.

Defining the Tree Parser

Tree parsers are defined in grammar files, in our case in the same file: NotifyCondition.g. From the ANTLR manual:

{ optional class code preamble }
"class" YourTreeParserClass "extends TreeParser";
options
tokens
{ optional action for instance vars/methods }
tree parser rules...

The tree parser rule definitions should look very familiar:

rulename
    ":" alternative_1
    "|" alternative_2
    ...
    "|" alternative_n
    ";"

The difference is in how the alternatives are specified. ANTLR uses a special syntax that allows matching against trees:

"#(" root-token child1 child2 ... childn ")"

This will match against a tree (including subtrees) that has the given token as its root node, and has children matching the “childx” subrules. In a simple case, the child subrules are just token references, for example:

#( "and" "changed" "success" )

would match the tree:

    changed
and
    success

More generally, however, we can define the children recursively in terms of the tree parser rules themselves. This allows us to match against the ASTs generated by our parser:

class NotifyConditionTreeParser extends TreeParser;
cond
    : #("and" cond cond)
    | #("or" cond cond)
    | #("not" cond)
    | condition
    ;
condition
    : "true"
    | "false"
    | "success"
    | "failure"
    | "error"
    | "changed"
    | "changed.by.me"
    | "state.change"
    ;

Notice how the “cond” rule is defined recursively: children of an “and” expression can themselves be arbitrary conditions. In fact, this is more permissive than our parser, but that keeps the tree rules simpler.

Evaluating the Tree

The tree parser we defined does not yet do anything useful. To transform the AST into the corresponding NotifyCondition structure we need to embed actions into the tree rules. These actions can be used to insert code into the tree parser methods that conver the matched subtrees into corresponding NotifyCondition’s. Let’s dive right into the full tree parser definition:

class NotifyConditionTreeParser extends TreeParser;
{
    private NotifyConditionFactory factory;
    public void setNotifyConditionFactory(NotifyConditionFactory factory)
    {
        this.factory = factory;
    }
}
cond returns [NotifyCondition r]
{
    NotifyCondition a, b;
    r = null;
}
    : #("and" a=cond b=cond) {
        r = new CompoundNotifyCondition(a, b, false);
    }
    | #("or" a=cond b=cond) {
        r = new CompoundNotifyCondition(a, b, true);
    }
    | #("not" a=cond) {
        r = new NotNotifyCondition(a);
    }
    | c:condition {
        r = factory.createCondition(c.getText());
    }
    ;
condition
    : "true"
    | "false"
    | "success"
    | "failure"
    | "error"
    | "changed"
    | "changed.by.me"
    | "state.change"
    ;

The first thing you’ll notice is a new function setNotifyConditionFactory that allows us to inject a factory into the parser. Next, you can see that the “cond” rule has been updated to return a NotifyCondition, from a variable named “r”. The syntax looks odd at first, but it is just a way to add Java code to the method that processes the rule. Likewise, we declare NotifyCondition variables “a” and “b”. Using the “cond=a” syntax, we can assign the result of the recursive invocations of the “cond” rule to these variables. Finally, we define the NotifyCondition to return for each alternative by embedding Java code to assign the value of “r”. I have skimmed over the details somewhat here, but you can always take a look at the NotifyConditionTreeParser code generated by ANTLR to help make the connection.

Generating the Tree Parser

OK, time to fire up ANTLR again:

$ java -classpath antlr-2.7.6.jar antlr.Tool NotifyCondition.g

We get a full 10 files:

NotifyConditionLexer.java
NotifyConditionLexer.smap
NotifyConditionParser.java
NotifyConditionParser.smap
NotifyConditionParserTokenTypes.java
NotifyConditionParserTokenTypes.txt
NotifyConditionTreeParser.java
NotifyConditionTreeParser.smap
NotifyConditionTreeParserTokenTypes.java
NotifyConditionTreeParserTokenTypes.txt

The tree parser itself is, of course, defined in NotifyConditionTreeParser.java. We can put the lexer, parser and tree parser together to transform an input string into a self-evaluating NotifyCondition:

public NotifyCondition getNotifyCondition(String expression)
{
    NotifyCondition notifyCondition = null;
    StringReader reader = null;

    try
    {
        reader = new StringReader(expression);
       
        NotifyConditionLexer lexer;
        NotifyConditionParser parser;
        NotifyConditionTreeParser tree;

        lexer = new NotifyConditionLexer(reader);
        parser = new NotifyConditionParser(lexer);
        tree = new NotifyConditionTreeParser();
        tree.setNotifyConditionFactory(notifyFactory);

        parser.orexpression();
        AST t = parser.getAST();       
        notifyCondition = tree.cond(t);
    }
    catch (Exception e)
    {
        System.stderr.println(e.getMessage());
    }
    finally
    {
        if(reader != null)
        {
            reader.close();
        }
    }

    return notifyCondition;
}

That’s it! We have reached the end goal. Now when a build completes, we can pass the build result to the NotifyCondition object to determine if a notification should be sent.

Wrap Up

Well, it took a few steps, but we have finally transformed our input into our desired output. Hopefully this has shown some of the power of ANTLR, and will encourage you to take a closer look. In the final part, we will look into minor improvements to add some polish to the implementation.

Pulse Continuous Integration Server 1.0.6

Wednesday, June 28th, 2006

Pulse version 1.0.6 has been released. This release fixes some minor issues in 1.0.5, see the release notes for details.