a little madness

A man needs a little madness, or else he never dares cut the rope and be free -Nikos Kazantzakis

Zutubi

ANTLR By Example: Part 4: Tree Parsing

Introduction

By the end of Part 3: Parsing, we had a working parser that could understand our expression language. The parser turns an input string into an Abstract Syntax Tree (AST) representing the expression. The final piece of the puzzle is evaluating the expression for a given build result (remember, these expressions are used to determine when to send build notifications). To do this we will make use of an ANTLR tree parser.

NotifyCondition’s

Before we get into the details of tree parsers, let’s first define the end goal. Currently we have a way to generate an AST for an expression. What we need, however, is a way to evaluate the expression for a given build result. Approaching from this angle, we can use a self-evaluating data structure that takes a build result and returns true if a notification should be sent for this result. The NotifyCondition interface serves as a basis for this data structure:

public interface NotifyCondition
{
    public boolean satisfied(BuildResult result, User user);
}

This interface defines just one method, satisfied, which returns true iff a notification should be sent for the given build result. The additional user argument is required to evaluate the “changed.by.me” condition. Implementations of this interface mirror the primitive conditions in our expression language:

  • ChangedByMeNotifyCondition
  • ChangedNotifyCondition
  • ErrorNotifyCondition
  • FailureNotifyCondition
  • FalseNotifyCondition
  • StateChangeNotifyCondition
  • SuccessNotifyCondition
  • TrueNotifyCondition

In addition, compound expressions (i.e. those involving operators) are represented using two further implementations:

  • CompoundNotifyCondition: combines a list of child NotifyConditions either conjunctively (and) or disjunctively (or).
  • NotNotifyCondition: inverts a single child NotifyCondition

Using these implementations of the NotifyCondition interface, we can create the self-evaluating data structure we need to test build results. The missing link is getting from the AST to the right combination of NotifyConditions. To help create NotifyConditions based on expressions, a simple factory is used:

class NotifyConditionFactory
{
    …

    public NotifyCondition createCondition(String condition)
    {
        …
    }
}

The factory takes a primitive condition string (e.g. “state.change”) and returns a new instance of the corresponding condition (e.g. StateChangeNotifyCondition).

Tree Parsers

A tree parser is a special kind of ANTLR parser that operates on an AST, rather than a token stream. This seems strange at first, as trees are a two-dimensional structure, not a single-dimansional stream. However, the principles are much the same: tree parsers just have a special rule syntax for matching input in two dimensions. In our case, we use the tree parser as a simple, declarative way to transform our AST into our own data structure representing the expression.

Defining the Tree Parser

Tree parsers are defined in grammar files, in our case in the same file: NotifyCondition.g. From the ANTLR manual:

{ optional class code preamble }
"class" YourTreeParserClass "extends TreeParser";
options
tokens
{ optional action for instance vars/methods }
tree parser rules...

The tree parser rule definitions should look very familiar:

rulename
    ":" alternative_1
    "|" alternative_2
    ...
    "|" alternative_n
    ";"

The difference is in how the alternatives are specified. ANTLR uses a special syntax that allows matching against trees:

"#(" root-token child1 child2 ... childn ")"

This will match against a tree (including subtrees) that has the given token as its root node, and has children matching the “childx” subrules. In a simple case, the child subrules are just token references, for example:

#( "and" "changed" "success" )

would match the tree:

    changed
and
    success

More generally, however, we can define the children recursively in terms of the tree parser rules themselves. This allows us to match against the ASTs generated by our parser:

class NotifyConditionTreeParser extends TreeParser;
cond
    : #("and" cond cond)
    | #("or" cond cond)
    | #("not" cond)
    | condition
    ;
condition
    : "true"
    | "false"
    | "success"
    | "failure"
    | "error"
    | "changed"
    | "changed.by.me"
    | "state.change"
    ;

Notice how the “cond” rule is defined recursively: children of an “and” expression can themselves be arbitrary conditions. In fact, this is more permissive than our parser, but that keeps the tree rules simpler.

Evaluating the Tree

The tree parser we defined does not yet do anything useful. To transform the AST into the corresponding NotifyCondition structure we need to embed actions into the tree rules. These actions can be used to insert code into the tree parser methods that conver the matched subtrees into corresponding NotifyCondition’s. Let’s dive right into the full tree parser definition:

class NotifyConditionTreeParser extends TreeParser;
{
    private NotifyConditionFactory factory;
    public void setNotifyConditionFactory(NotifyConditionFactory factory)
    {
        this.factory = factory;
    }
}
cond returns [NotifyCondition r]
{
    NotifyCondition a, b;
    r = null;
}
    : #("and" a=cond b=cond) {
        r = new CompoundNotifyCondition(a, b, false);
    }
    | #("or" a=cond b=cond) {
        r = new CompoundNotifyCondition(a, b, true);
    }
    | #("not" a=cond) {
        r = new NotNotifyCondition(a);
    }
    | c:condition {
        r = factory.createCondition(c.getText());
    }
    ;
condition
    : "true"
    | "false"
    | "success"
    | "failure"
    | "error"
    | "changed"
    | "changed.by.me"
    | "state.change"
    ;

The first thing you’ll notice is a new function setNotifyConditionFactory that allows us to inject a factory into the parser. Next, you can see that the “cond” rule has been updated to return a NotifyCondition, from a variable named “r”. The syntax looks odd at first, but it is just a way to add Java code to the method that processes the rule. Likewise, we declare NotifyCondition variables “a” and “b”. Using the “cond=a” syntax, we can assign the result of the recursive invocations of the “cond” rule to these variables. Finally, we define the NotifyCondition to return for each alternative by embedding Java code to assign the value of “r”. I have skimmed over the details somewhat here, but you can always take a look at the NotifyConditionTreeParser code generated by ANTLR to help make the connection.

Generating the Tree Parser

OK, time to fire up ANTLR again:

$ java -classpath antlr-2.7.6.jar antlr.Tool NotifyCondition.g

We get a full 10 files:

NotifyConditionLexer.java
NotifyConditionLexer.smap
NotifyConditionParser.java
NotifyConditionParser.smap
NotifyConditionParserTokenTypes.java
NotifyConditionParserTokenTypes.txt
NotifyConditionTreeParser.java
NotifyConditionTreeParser.smap
NotifyConditionTreeParserTokenTypes.java
NotifyConditionTreeParserTokenTypes.txt

The tree parser itself is, of course, defined in NotifyConditionTreeParser.java. We can put the lexer, parser and tree parser together to transform an input string into a self-evaluating NotifyCondition:

public NotifyCondition getNotifyCondition(String expression)
{
    NotifyCondition notifyCondition = null;
    StringReader reader = null;

    try
    {
        reader = new StringReader(expression);
       
        NotifyConditionLexer lexer;
        NotifyConditionParser parser;
        NotifyConditionTreeParser tree;

        lexer = new NotifyConditionLexer(reader);
        parser = new NotifyConditionParser(lexer);
        tree = new NotifyConditionTreeParser();
        tree.setNotifyConditionFactory(notifyFactory);

        parser.orexpression();
        AST t = parser.getAST();        
        notifyCondition = tree.cond(t);
    }
    catch (Exception e)
    {
        System.stderr.println(e.getMessage());
    }
    finally
    {
        if(reader != null)
        {
            reader.close();
        }
    }

    return notifyCondition;
}

That’s it! We have reached the end goal. Now when a build completes, we can pass the build result to the NotifyCondition object to determine if a notification should be sent.

Wrap Up

Well, it took a few steps, but we have finally transformed our input into our desired output. Hopefully this has shown some of the power of ANTLR, and will encourage you to take a closer look. In the final part, we will look into minor improvements to add some polish to the implementation.

Liked this post? Share it!

5 Responses to “ANTLR By Example: Part 4: Tree Parsing”

  1. August 7th, 2006 at 7:53 am

    Designker says:

    Following the instructions fails to create 10 files. Instead 8 are created and there are errors due to unresolvable type “NotifyConditionFactory”. Where in the grammar file should the details be for creating the NotifyConditionFactory class?

    NotifyConditionTreeParserTokenTypes.java
    NotifyConditionTreeParserTokenTypes.txt are the files that it fails to create. The entire G file from the ttorial is

    header {

    }

    class NotifyConditionLexer extends Lexer;

    // Words, which include our operators
    WORD: (‘a’..’z’ | ‘A’..’Z’ | ‘.’)+ ;

    // Grouping
    LEFT_PAREN: ‘(‘;
    RIGHT_PAREN: ‘)’;

    WHITESPACE
    : (‘ ‘) { $setType(Token.SKIP); }
    ;

    class NotifyConditionParser extends Parser;

    options {
    buildAST=true;
    }

    orexpression
    : andexpression (“or”^ andexpression)*
    ;

    andexpression
    : notexpression (“and”^ notexpression)*
    ;

    notexpression
    : (“not”^)? atom
    ;

    atom
    : condition
    | LEFT_PAREN! orexpression RIGHT_PAREN!
    ;

    condition
    : “true”
    | “false”
    | “success”
    | “failure”
    | “error”
    | “changed”
    | “changed.by.me”
    | “state.change”
    ;

    class NotifyConditionTreeParser extends TreeParser;
    {
    private NotifyConditionFactory factory;

    public void setNotifyConditionFactory(NotifyConditionFactory factory)
    {
    this.factory = factory;
    }
    }

    cond
    : #(“and” cond cond)
    | #(“or” cond cond)
    | #(“not” cond)
    | condition
    ;

    condition
    : “true”
    | “false”
    | “success”
    | “failure”
    | “error”
    | “changed”
    | “changed.by.me”
    | “state.change”
    ;

  2. August 8th, 2006 at 10:30 pm

    Jason says:

    Designker,

    The factory is not defined in the grammar files: it is a manually created class in our application. The factory has a method:

    NotifyCondition createCondition(String condition)

    which creates a notify condition object based on the condition string (e.g. “success” creates a SuccessNotifyCondition object). These *NotifyCondition classes are also a manually-created part of our application. You will need to adapt the example to suit you application!

  3. December 23rd, 2009 at 9:59 pm

    Andy McMullan says:

    Just wanted to say thanks for this great series – it’s by far the best tutorial for ANTLR that I’ve come across.

  4. August 24th, 2010 at 11:19 pm

    espinete says:

    Any Tree Parsing for C# ? any sample code ?? please

  5. January 24th, 2013 at 8:50 am

    manu says:

    Hello,

    Let’s suppose I have written a grammar for a calculator.

    How do I get the job done?

    Let’s suppose: 2+2*4/3

    The parser parses (hummm.) my code. OK What’s the result?

Leave a Reply