ANTLR By Example: Part 4: Tree Parsing
Thursday, July 6th, 2006Introduction
By the end of Part 3: Parsing, we had a working parser that could understand our expression language. The parser turns an input string into an Abstract Syntax Tree (AST) representing the expression. The final piece of the puzzle is evaluating the expression for a given build result (remember, these expressions are used to determine when to send build notifications). To do this we will make use of an ANTLR tree parser.
NotifyCondition’s
Before we get into the details of tree parsers, let’s first define the end goal. Currently we have a way to generate an AST for an expression. What we need, however, is a way to evaluate the expression for a given build result. Approaching from this angle, we can use a self-evaluating data structure that takes a build result and returns true if a notification should be sent for this result. The NotifyCondition interface serves as a basis for this data structure:
{
public boolean satisfied(BuildResult result, User user);
}
This interface defines just one method, satisfied, which returns true iff a notification should be sent for the given build result. The additional user argument is required to evaluate the “changed.by.me” condition. Implementations of this interface mirror the primitive conditions in our expression language:
- ChangedByMeNotifyCondition
- ChangedNotifyCondition
- ErrorNotifyCondition
- FailureNotifyCondition
- FalseNotifyCondition
- StateChangeNotifyCondition
- SuccessNotifyCondition
- TrueNotifyCondition
In addition, compound expressions (i.e. those involving operators) are represented using two further implementations:
- CompoundNotifyCondition: combines a list of child NotifyConditions either conjunctively (and) or disjunctively (or).
- NotNotifyCondition: inverts a single child NotifyCondition
Using these implementations of the NotifyCondition interface, we can create the self-evaluating data structure we need to test build results. The missing link is getting from the AST to the right combination of NotifyConditions. To help create NotifyConditions based on expressions, a simple factory is used:
The factory takes a primitive condition string (e.g. “state.change”) and returns a new instance of the corresponding condition (e.g. StateChangeNotifyCondition).
Tree Parsers
A tree parser is a special kind of ANTLR parser that operates on an AST, rather than a token stream. This seems strange at first, as trees are a two-dimensional structure, not a single-dimansional stream. However, the principles are much the same: tree parsers just have a special rule syntax for matching input in two dimensions. In our case, we use the tree parser as a simple, declarative way to transform our AST into our own data structure representing the expression.
Defining the Tree Parser
Tree parsers are defined in grammar files, in our case in the same file: NotifyCondition.g. From the ANTLR manual:
{ optional class code preamble }
"class" YourTreeParserClass "extends TreeParser";
options
tokens
{ optional action for instance vars/methods }
tree parser rules...
The tree parser rule definitions should look very familiar:
rulename
":" alternative_1
"|" alternative_2
...
"|" alternative_n
";"
The difference is in how the alternatives are specified. ANTLR uses a special syntax that allows matching against trees:
"#(" root-token child1 child2 ... childn ")"
This will match against a tree (including subtrees) that has the given token as its root node, and has children matching the “childx” subrules. In a simple case, the child subrules are just token references, for example:
#( "and" "changed" "success" )
would match the tree:
changed
and
success
More generally, however, we can define the children recursively in terms of the tree parser rules themselves. This allows us to match against the ASTs generated by our parser:
class NotifyConditionTreeParser extends TreeParser;
cond
: #("and" cond cond)
| #("or" cond cond)
| #("not" cond)
| condition
;
condition
: "true"
| "false"
| "success"
| "failure"
| "error"
| "changed"
| "changed.by.me"
| "state.change"
;
Notice how the “cond” rule is defined recursively: children of an “and” expression can themselves be arbitrary conditions. In fact, this is more permissive than our parser, but that keeps the tree rules simpler.
Evaluating the Tree
The tree parser we defined does not yet do anything useful. To transform the AST into the corresponding NotifyCondition structure we need to embed actions into the tree rules. These actions can be used to insert code into the tree parser methods that conver the matched subtrees into corresponding NotifyCondition’s. Let’s dive right into the full tree parser definition:
class NotifyConditionTreeParser extends TreeParser;
{
private NotifyConditionFactory factory;
public void setNotifyConditionFactory(NotifyConditionFactory factory)
{
this.factory = factory;
}
}
cond returns [NotifyCondition r]
{
NotifyCondition a, b;
r = null;
}
: #("and" a=cond b=cond) {
r = new CompoundNotifyCondition(a, b, false);
}
| #("or" a=cond b=cond) {
r = new CompoundNotifyCondition(a, b, true);
}
| #("not" a=cond) {
r = new NotNotifyCondition(a);
}
| c:condition {
r = factory.createCondition(c.getText());
}
;
condition
: "true"
| "false"
| "success"
| "failure"
| "error"
| "changed"
| "changed.by.me"
| "state.change"
;
The first thing you’ll notice is a new function setNotifyConditionFactory that allows us to inject a factory into the parser. Next, you can see that the “cond” rule has been updated to return a NotifyCondition, from a variable named “r”. The syntax looks odd at first, but it is just a way to add Java code to the method that processes the rule. Likewise, we declare NotifyCondition variables “a” and “b”. Using the “cond=a” syntax, we can assign the result of the recursive invocations of the “cond” rule to these variables. Finally, we define the NotifyCondition to return for each alternative by embedding Java code to assign the value of “r”. I have skimmed over the details somewhat here, but you can always take a look at the NotifyConditionTreeParser code generated by ANTLR to help make the connection.
Generating the Tree Parser
OK, time to fire up ANTLR again:
We get a full 10 files:
NotifyConditionLexer.smap
NotifyConditionParser.java
NotifyConditionParser.smap
NotifyConditionParserTokenTypes.java
NotifyConditionParserTokenTypes.txt
NotifyConditionTreeParser.java
NotifyConditionTreeParser.smap
NotifyConditionTreeParserTokenTypes.java
NotifyConditionTreeParserTokenTypes.txt
The tree parser itself is, of course, defined in NotifyConditionTreeParser.java. We can put the lexer, parser and tree parser together to transform an input string into a self-evaluating NotifyCondition:
{
NotifyCondition notifyCondition = null;
StringReader reader = null;
try
{
reader = new StringReader(expression);
NotifyConditionLexer lexer;
NotifyConditionParser parser;
NotifyConditionTreeParser tree;
lexer = new NotifyConditionLexer(reader);
parser = new NotifyConditionParser(lexer);
tree = new NotifyConditionTreeParser();
tree.setNotifyConditionFactory(notifyFactory);
parser.orexpression();
AST t = parser.getAST();
notifyCondition = tree.cond(t);
}
catch (Exception e)
{
System.stderr.println(e.getMessage());
}
finally
{
if(reader != null)
{
reader.close();
}
}
return notifyCondition;
}
That’s it! We have reached the end goal. Now when a build completes, we can pass the build result to the NotifyCondition object to determine if a notification should be sent.
Wrap Up
Well, it took a few steps, but we have finally transformed our input into our desired output. Hopefully this has shown some of the power of ANTLR, and will encourage you to take a closer look. In the final part, we will look into minor improvements to add some polish to the implementation.