a little madness

A man needs a little madness, or else he never dares cut the rope and be free -Nikos Kazantzakis

Zutubi

Archive for July, 2006

Pulse Continuous Integration Server 1.1 Beta!

Well, it’s finally here! The first public beta of pulse 1.1. This is a huge milestone for us, just take a look at what’s new to see what I mean. The biggest single feature is powerful distributed building, but as you’ll see the list is long!

Thanks so much to our private pre-release testers, your feedback has been invaluable. Now for the rest of you: bring it on!

Execute the Future

No matter how much multithreaded code you write, it will never be simple. The added complexity of races between threads makes it much harder to predict the possible combinations of events in your program. Even worse, testing the combinations is difficult, requiring thread-aware test cases with explicit synchronisation points. Thus it is a pleasure to see a package like java.util.concurrent offering more than just synchronisation primitives, but also higher level abstractions1.

One of the simplest but most useful abstractions in the package is the Future interface. As stated in the JavaDoc, this interface:

abstracts the result of an asynchronous computation

By encapsulating an operation in a Future, you can execute it asychronously (typically using an Executor). The Future may be queried, cancelled, and most importantly can return the result of the computation. When asking for the result (via one of the get methods), you can also specify a timeout indicating how long you are willing to wait for the result to become available.

To take a concrete example, recently I had a problem with some networking code. The purpose of the code was to ping a bunch of machines to see which of them were online and responding. Unfortunately, when a machine is unavailable it can take some time for the ping to fail, making the whole process take several minutes. In my case, such a long ping time indicates that the machine is essentially unusable anyhow, so I would prefer to fail the ping after a reasonable delay, say 10 seconds. I achieved this timeout without a single line of sychronisation code by using the FutureTask implementation of the Future interface:


public class DefaultAgentManager
{
private ExecutorService pingService =
Executors.newCachedThreadPool();

private class Pinger implements
Callable
{
private SlaveAgent agent;

public Pinger(SlaveAgent agent)
{
this.agent = agent;
}

public SlaveStatus call() throws Exception
{
SlaveStatus status;

try
{
agent.getSlaveService().ping();
}
catch (Exception e)
{
status = new SlaveStatus(Status.OFFLINE,
e.getMessage());
}

return status;
}
}

void pingSlave(SlaveAgent agent)
{
FutureTask future =
new FutureTask(new Pinger(agent));

pingService.execute(future);

SlaveStatus status;

try
{
status = future.get(10, TimeUnit.SECONDS);
}
catch (TimeoutException e)
{
status = new SlaveStatus(Status.OFFLINE,
"Agent ping timed out");
}
catch(Exception e)
{
status = new SlaveStatus(Status.OFFLINE,
e.getMessage());
}

agent.updateStatus(status);
}
}

Although I have simplified the code for clarity2, the most error-prone part of the implementation all lives in java.util.concurrent, which is tried and tested. There was no need for me to carefully synchronise and test for races, I was able to just get on with the job.

This is just one small part of the functionality offered by Futures. Combining them with CompletionServices, for example, allows an even higher level of abstraction for dealing with multiple tasks. If you write multithreaded code, and haven’t looked at the java.util.concurrent package, then do yourself a favour and dive in. The JavaDoc comes complete with usage suggestions and example code snippets, making the package very approachable. Package designer Doug Lea has also authored multiple books on the topic.


1 moving out of the imperative world we get into even more interesting abstractions to help exploit concurrency, but that’s a topic all of its own.
2 the most significant omission is how to handle the tasks that did not complete before the timeout; in some applications they may be a resource burden, in others it may be safe to let them complete in their own time

ANTLR By Example: Part 5: Extra Credit

Introduction

Over the past four parts, I have illustrated how to parse and evaluate boolean expressions using ANTLR. The grammar presented is in those parts is based on real code in pulse. Although it works as presented, there are a couple of items to polish up, one of which I have solved, and the other of which I have not yet been able to solve.

Error Reporting

As pulse allows users to enter their own boolean expressions (to configure when they receive build notifications), decent error reporting is paramount. The first step is to turn off ANTLR’s default error handling, so that the errors can be handled by pulse. This is done by setting the defaultErrorHandler option to false:

class NotifyConditionParser extends Parser;

options {
        buildAST=true;
        defaultErrorHandler=false;
}

With that done, the ANTLR-generated code will throw exceptions on errors. Let’s take a look at the sorts of errors that are generated by the grammar as it stands.

Case 1: Unrecognised word:

$ java NotifyConditionParserTest "changed or tuer"
Caught error: unexpected token: tuer

Case 2: Unrecognised character:

$ java NotifyConditionParserTest "6 and false"
Caught error: unexpected char: '6'

Case 3: Illegal expression structure

$ java NotifyConditionParserTest "state.change or or success"
Caught error: unexpected token: or

Case 4: Unbalanced parentheses

$ java NotifyConditionParserTest "failure or (changed and success"
Caught error: expecting RIGHT_PAREN, found 'null'

Most of these messages are not too bad, at least they are on the right track. Case 4 is certainly the worst of the lot, although the information is accurate it is not exactly user friendly. We’ll get back to that later. One big thing missing in all cases is location information. I figured that ANTLR must have a way to retrieve the information, and a little digging uncovered it. All of the above messages are generated using the getMessage method of the exceptions thrown by ANTLR. To get the line and column number information (which is indeed stored in the exception), you can use the toString method instead:


try
{
...
}
catch (Exception e)
{
System.err.println("Caught error: " + e.toString());
}

Trying case 1 again:

$ java NotifyConditionParserTest "changed or tuer"
Caught error: line 1:12: unexpected token: tuer

Much better! Now the user knows where the error occured. That leaves us with case 4, which is still a little on the cryptic side:

$ java NotifyConditionParserTest "failure or (changed and success"
Caught error: expecting RIGHT_PAREN, found 'null'

It would be nice if we could not expose the raw token names (e.g. RIGHT_PAREN) and also explicitly say we hit the end of the input (instead of “found ‘null'”). To fix the former problem, we can add paraphrase options to our lexer tokens. This allows us to specify a phrase describing the token which will be used in error messages instead of the token name. The options are applied in the grammar file as part of the lexer rules, for example:

RIGHT_PAREN
options {
    paraphrase = "a closing parenthesis ')'";
}
    : ')';

Applying the paraphrases improves the error message considerably:

$ java NotifyConditionParserTest "failure or (changed and success"
Caught error: line 1:32: expecting a closing parenthesis ')', found 'null'

Unfortunately, we still have the pesky “found ‘null'” to deal with. In this case, I haven’t yet found a simple way to customise the error message. Instead, it is handled as a special case. I found that in this case the exception being thrown was a MismatchedTokenException, with the text of the found token set to null. This allowed the specific case to be handled with a custom message:


catch(MismatchedTokenException mte)
{
if(mte.token.getText() == null)
{
System.err.println("Caught error: line " +
mte.getLine() + ":" +
mte.getColumn() +
": end of input when expecting " +
NotifyConditionParser._tokenNames[mte.expecting]);
}
else
{
System.err.println("Caught error: " + mte.toString());
}
}

This is far from an ideal solution, and I am still looking for a better alternative. However, the user experience is king, and this hack improves it:

$ java NotifyConditionParserTest "failure or (changed and success"
Caught error: line 1:32: end of input when expecting a closing parenthesis ')'

DRY Violation

Those paying close attention would have noticed a wrinkle in the final ANTLR grammar: a violation of the DRY (Don’t Repeat Yourself) principle. Specifically, both the parser and tree parser share a common rule, which is repeated verbatim in the grammar file:

condition
    : "true"
    | "false"
    | "success"
    | "failure"
    | "error"
    | "changed"
    | "changed.by.me"
    | "state.change"
    ;

Despite scouring the ANTLR documentation, I am yet to find a way around this. I even took a look at some of the example grammars on the ANTLR website, and noticed that they suffer from a similar problem. If anyone knows a way to reuse a rule, let me know! I would love to remove the duplication.

Wrap Up

Well, that just about does it. I hope this series of posts has piqued your interest in ANTLR and parsing, and maybe even helped you to solve some of your own problems. Now go forth and parse!

ANTLR By Example: Part 4: Tree Parsing

Introduction

By the end of Part 3: Parsing, we had a working parser that could understand our expression language. The parser turns an input string into an Abstract Syntax Tree (AST) representing the expression. The final piece of the puzzle is evaluating the expression for a given build result (remember, these expressions are used to determine when to send build notifications). To do this we will make use of an ANTLR tree parser.

NotifyCondition’s

Before we get into the details of tree parsers, let’s first define the end goal. Currently we have a way to generate an AST for an expression. What we need, however, is a way to evaluate the expression for a given build result. Approaching from this angle, we can use a self-evaluating data structure that takes a build result and returns true if a notification should be sent for this result. The NotifyCondition interface serves as a basis for this data structure:


public interface NotifyCondition
{
public boolean satisfied(BuildResult result, User user);
}

This interface defines just one method, satisfied, which returns true iff a notification should be sent for the given build result. The additional user argument is required to evaluate the “changed.by.me” condition. Implementations of this interface mirror the primitive conditions in our expression language:

  • ChangedByMeNotifyCondition
  • ChangedNotifyCondition
  • ErrorNotifyCondition
  • FailureNotifyCondition
  • FalseNotifyCondition
  • StateChangeNotifyCondition
  • SuccessNotifyCondition
  • TrueNotifyCondition

In addition, compound expressions (i.e. those involving operators) are represented using two further implementations:

  • CompoundNotifyCondition: combines a list of child NotifyConditions either conjunctively (and) or disjunctively (or).
  • NotNotifyCondition: inverts a single child NotifyCondition

Using these implementations of the NotifyCondition interface, we can create the self-evaluating data structure we need to test build results. The missing link is getting from the AST to the right combination of NotifyConditions. To help create NotifyConditions based on expressions, a simple factory is used:


class NotifyConditionFactory
{
...

public NotifyCondition createCondition(String condition)
{
...
}
}

The factory takes a primitive condition string (e.g. “state.change”) and returns a new instance of the corresponding condition (e.g. StateChangeNotifyCondition).

Tree Parsers

A tree parser is a special kind of ANTLR parser that operates on an AST, rather than a token stream. This seems strange at first, as trees are a two-dimensional structure, not a single-dimansional stream. However, the principles are much the same: tree parsers just have a special rule syntax for matching input in two dimensions. In our case, we use the tree parser as a simple, declarative way to transform our AST into our own data structure representing the expression.

Defining the Tree Parser

Tree parsers are defined in grammar files, in our case in the same file: NotifyCondition.g. From the ANTLR manual:

{ optional class code preamble }

"class" YourTreeParserClass "extends TreeParser";

options

tokens

{ optional action for instance vars/methods }

tree parser rules...

The tree parser rule definitions should look very familiar:

rulename
    ":" alternative_1
    "|" alternative_2
    ...
    "|" alternative_n
    ";"

The difference is in how the alternatives are specified. ANTLR uses a special syntax that allows matching against trees:

"#(" root-token child1 child2 ... childn ")"

This will match against a tree (including subtrees) that has the given token as its root node, and has children matching the “childx” subrules. In a simple case, the child subrules are just token references, for example:

#( "and" "changed" "success" )

would match the tree:

    changed
and
    success

More generally, however, we can define the children recursively in terms of the tree parser rules themselves. This allows us to match against the ASTs generated by our parser:

class NotifyConditionTreeParser extends TreeParser;

cond
    : #("and" cond cond)
    | #("or" cond cond)
    | #("not" cond)
    | condition
    ;

condition
    : "true"
    | "false"
    | "success"
    | "failure"
    | "error"
    | "changed"
    | "changed.by.me"
    | "state.change"
    ;

Notice how the “cond” rule is defined recursively: children of an “and” expression can themselves be arbitrary conditions. In fact, this is more permissive than our parser, but that keeps the tree rules simpler.

Evaluating the Tree

The tree parser we defined does not yet do anything useful. To transform the AST into the corresponding NotifyCondition structure we need to embed actions into the tree rules. These actions can be used to insert code into the tree parser methods that conver the matched subtrees into corresponding NotifyCondition’s. Let’s dive right into the full tree parser definition:

class NotifyConditionTreeParser extends TreeParser;

{
    private NotifyConditionFactory factory;

    public void setNotifyConditionFactory(NotifyConditionFactory factory)
    {
        this.factory = factory;
    }
}

cond returns [NotifyCondition r]
{
    NotifyCondition a, b;
    r = null;
}
    : #("and" a=cond b=cond) {
        r = new CompoundNotifyCondition(a, b, false);
    }
    | #("or" a=cond b=cond) {
        r = new CompoundNotifyCondition(a, b, true);
    }
    | #("not" a=cond) {
        r = new NotNotifyCondition(a);
    }
    | c:condition {
        r = factory.createCondition(c.getText());
    }
    ;

condition
    : "true"
    | "false"
    | "success"
    | "failure"
    | "error"
    | "changed"
    | "changed.by.me"
    | "state.change"
    ;

The first thing you’ll notice is a new function setNotifyConditionFactory that allows us to inject a factory into the parser. Next, you can see that the “cond” rule has been updated to return a NotifyCondition, from a variable named “r”. The syntax looks odd at first, but it is just a way to add Java code to the method that processes the rule. Likewise, we declare NotifyCondition variables “a” and “b”. Using the “cond=a” syntax, we can assign the result of the recursive invocations of the “cond” rule to these variables. Finally, we define the NotifyCondition to return for each alternative by embedding Java code to assign the value of “r”. I have skimmed over the details somewhat here, but you can always take a look at the NotifyConditionTreeParser code generated by ANTLR to help make the connection.

Generating the Tree Parser

OK, time to fire up ANTLR again:


$ java -classpath antlr-2.7.6.jar antlr.Tool NotifyCondition.g

We get a full 10 files:


NotifyConditionLexer.java
NotifyConditionLexer.smap
NotifyConditionParser.java
NotifyConditionParser.smap
NotifyConditionParserTokenTypes.java
NotifyConditionParserTokenTypes.txt
NotifyConditionTreeParser.java
NotifyConditionTreeParser.smap
NotifyConditionTreeParserTokenTypes.java
NotifyConditionTreeParserTokenTypes.txt

The tree parser itself is, of course, defined in NotifyConditionTreeParser.java. We can put the lexer, parser and tree parser together to transform an input string into a self-evaluating NotifyCondition:


public NotifyCondition getNotifyCondition(String expression)
{
NotifyCondition notifyCondition = null;
StringReader reader = null;

try
{
reader = new StringReader(expression);

NotifyConditionLexer lexer;
NotifyConditionParser parser;
NotifyConditionTreeParser tree;

lexer = new NotifyConditionLexer(reader);
parser = new NotifyConditionParser(lexer);
tree = new NotifyConditionTreeParser();
tree.setNotifyConditionFactory(notifyFactory);

parser.orexpression();
AST t = parser.getAST();
notifyCondition = tree.cond(t);
}
catch (Exception e)
{
System.stderr.println(e.getMessage());
}
finally
{
if(reader != null)
{
reader.close();
}
}

return notifyCondition;
}

That’s it! We have reached the end goal. Now when a build completes, we can pass the build result to the NotifyCondition object to determine if a notification should be sent.

Wrap Up

Well, it took a few steps, but we have finally transformed our input into our desired output. Hopefully this has shown some of the power of ANTLR, and will encourage you to take a closer look. In the final part, we will look into minor improvements to add some polish to the implementation.