ANTLR By Example: Part 5: Extra Credit

Introduction

Over the past four parts, I have illustrated how to parse and evaluate boolean expressions using ANTLR. The grammar presented is in those parts is based on real code in pulse. Although it works as presented, there are a couple of items to polish up, one of which I have solved, and the other of which I have not yet been able to solve.

Error Reporting

As pulse allows users to enter their own boolean expressions (to configure when they receive build notifications), decent error reporting is paramount. The first step is to turn off ANTLR’s default error handling, so that the errors can be handled by pulse. This is done by setting the defaultErrorHandler option to false:

class NotifyConditionParser extends Parser;
options {
        buildAST=true;
        defaultErrorHandler=false;
}

With that done, the ANTLR-generated code will throw exceptions on errors. Let’s take a look at the sorts of errors that are generated by the grammar as it stands.

Case 1: Unrecognised word:

$ java NotifyConditionParserTest "changed or tuer"
Caught error: unexpected token: tuer

Case 2: Unrecognised character:

$ java NotifyConditionParserTest "6 and false"
Caught error: unexpected char: '6'

Case 3: Illegal expression structure

$ java NotifyConditionParserTest "state.change or or success"
Caught error: unexpected token: or

Case 4: Unbalanced parentheses

$ java NotifyConditionParserTest "failure or (changed and success"
Caught error: expecting RIGHT_PAREN, found 'null'

Most of these messages are not too bad, at least they are on the right track. Case 4 is certainly the worst of the lot, although the information is accurate it is not exactly user friendly. We’ll get back to that later. One big thing missing in all cases is location information. I figured that ANTLR must have a way to retrieve the information, and a little digging uncovered it. All of the above messages are generated using the getMessage method of the exceptions thrown by ANTLR. To get the line and column number information (which is indeed stored in the exception), you can use the toString method instead:

try
{
    …
}
catch (Exception e)
{
    System.err.println(“Caught error: “ + e.toString());
}

Trying case 1 again:

$ java NotifyConditionParserTest "changed or tuer"
Caught error: line 1:12: unexpected token: tuer

Much better! Now the user knows where the error occured. That leaves us with case 4, which is still a little on the cryptic side:

$ java NotifyConditionParserTest "failure or (changed and success"
Caught error: expecting RIGHT_PAREN, found 'null'

It would be nice if we could not expose the raw token names (e.g. RIGHT_PAREN) and also explicitly say we hit the end of the input (instead of “found ‘null’”). To fix the former problem, we can add paraphrase options to our lexer tokens. This allows us to specify a phrase describing the token which will be used in error messages instead of the token name. The options are applied in the grammar file as part of the lexer rules, for example:

RIGHT_PAREN
options {
    paraphrase = "a closing parenthesis ')'";
}
    : ')';

Applying the paraphrases improves the error message considerably:

$ java NotifyConditionParserTest "failure or (changed and success"
Caught error: line 1:32: expecting a closing parenthesis ')', found 'null'

Unfortunately, we still have the pesky “found ‘null’” to deal with. In this case, I haven’t yet found a simple way to customise the error message. Instead, it is handled as a special case. I found that in this case the exception being thrown was a MismatchedTokenException, with the text of the found token set to null. This allowed the specific case to be handled with a custom message:

catch(MismatchedTokenException mte)
{
    if(mte.token.getText() == null)
    {
        System.err.println(“Caught error: line “ +
            mte.getLine() + “:” +
            mte.getColumn() +
            “: end of input when expecting “ +
            NotifyConditionParser._tokenNames[mte.expecting]);
    }
    else
    {
        System.err.println(“Caught error: “ + mte.toString());
    }
}

This is far from an ideal solution, and I am still looking for a better alternative. However, the user experience is king, and this hack improves it:

$ java NotifyConditionParserTest "failure or (changed and success"
Caught error: line 1:32: end of input when expecting a closing parenthesis ')'

DRY Violation

Those paying close attention would have noticed a wrinkle in the final ANTLR grammar: a violation of the DRY (Don’t Repeat Yourself) principle. Specifically, both the parser and tree parser share a common rule, which is repeated verbatim in the grammar file:

condition
    : "true"
    | "false"
    | "success"
    | "failure"
    | "error"
    | "changed"
    | "changed.by.me"
    | "state.change"
    ;

Despite scouring the ANTLR documentation, I am yet to find a way around this. I even took a look at some of the example grammars on the ANTLR website, and noticed that they suffer from a similar problem. If anyone knows a way to reuse a rule, let me know! I would love to remove the duplication.

Wrap Up

Well, that just about does it. I hope this series of posts has piqued your interest in ANTLR and parsing, and maybe even helped you to solve some of your own problems. Now go forth and parse!

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • del.icio.us
  • digg
  • DZone
  • Ma.gnolia
  • Reddit
  • Simpy
  • Slashdot
  • StumbleUpon
  • Technorati

4 Responses to “ANTLR By Example: Part 5: Extra Credit”

  1. Ibrahim Says:

    Hi!

    considering the DRY violation, have you tried

    class MyParser extends Parser;
    options {
    importVocab=V;
    }

    ANTLR will now look for VTokenTypes.txt in the current directory and preload the token manager for MyParser with the enclosed information.

    HTH,
    Ibrahim

  2. Kunnummal Says:

    Hi,
    Thanks for wonderful tutorial, I have really enjoyed this and got something out from it.

    I have a concern about the validation, looks like if we try

    java NotifyConditionParserTest “changed changed changed”

    It is picking the first one ‘changed’ and ignoring it, I would expect it raises an syntax exception.

    Thanks
    Kunnummal

  3. Demetrios Kyriakis Says:

    Thank you very much for this great tutorial.
    So far, this is the BEST ANTLR tutorial I’ve ever red, and even if ANTLR is pretty heavy, most users(I know) who read your tutorial understood it much better based on your “real and simple example” approach (it’s even
    simpler than the “simple calculator” example).

    Because this tutorial is so great, IMHO you could/should:
    – review it, and extend a little,
    – generate a PDF (all in one)
    – publish it on the ANTLR lists, so that everybody knows about it.
    – submit it to TheServerside and JavaWorld.com cause it’s very easy to understand and you would get a much bigger audience.

    Thank you,

    D.

  4. Jason Says:

    Demetrios,

    Thanks for the kind feedback. I would like to review the tutorial at some point. At the moment it (and unfortunately some questions from readers) has been neglected for a little while. Of course, it comes down to finding the time to review it properly, but you have provided some motivation :).

Leave a Reply