ANTLR By Example: Part 5: Extra Credit
Tuesday, July 11th, 2006Introduction
Over the past four parts, I have illustrated how to parse and evaluate boolean expressions using ANTLR. The grammar presented is in those parts is based on real code in pulse. Although it works as presented, there are a couple of items to polish up, one of which I have solved, and the other of which I have not yet been able to solve.
Error Reporting
As pulse allows users to enter their own boolean expressions (to configure when they receive build notifications), decent error reporting is paramount. The first step is to turn off ANTLR’s default error handling, so that the errors can be handled by pulse. This is done by setting the defaultErrorHandler option to false:
class NotifyConditionParser extends Parser;
options {
buildAST=true;
defaultErrorHandler=false;
}
With that done, the ANTLR-generated code will throw exceptions on errors. Let’s take a look at the sorts of errors that are generated by the grammar as it stands.
Case 1: Unrecognised word:
$ java NotifyConditionParserTest "changed or tuer" Caught error: unexpected token: tuer
Case 2: Unrecognised character:
$ java NotifyConditionParserTest "6 and false" Caught error: unexpected char: '6'
Case 3: Illegal expression structure
$ java NotifyConditionParserTest "state.change or or success" Caught error: unexpected token: or
Case 4: Unbalanced parentheses
$ java NotifyConditionParserTest "failure or (changed and success" Caught error: expecting RIGHT_PAREN, found 'null'
Most of these messages are not too bad, at least they are on the right track. Case 4 is certainly the worst of the lot, although the information is accurate it is not exactly user friendly. We’ll get back to that later. One big thing missing in all cases is location information. I figured that ANTLR must have a way to retrieve the information, and a little digging uncovered it. All of the above messages are generated using the getMessage method of the exceptions thrown by ANTLR. To get the line and column number information (which is indeed stored in the exception), you can use the toString method instead:
Trying case 1 again:
$ java NotifyConditionParserTest "changed or tuer" Caught error: line 1:12: unexpected token: tuer
Much better! Now the user knows where the error occured. That leaves us with case 4, which is still a little on the cryptic side:
$ java NotifyConditionParserTest "failure or (changed and success" Caught error: expecting RIGHT_PAREN, found 'null'
It would be nice if we could not expose the raw token names (e.g. RIGHT_PAREN) and also explicitly say we hit the end of the input (instead of “found ‘null’”). To fix the former problem, we can add paraphrase options to our lexer tokens. This allows us to specify a phrase describing the token which will be used in error messages instead of the token name. The options are applied in the grammar file as part of the lexer rules, for example:
RIGHT_PAREN
options {
paraphrase = "a closing parenthesis ')'";
}
: ')';
Applying the paraphrases improves the error message considerably:
$ java NotifyConditionParserTest "failure or (changed and success" Caught error: line 1:32: expecting a closing parenthesis ')', found 'null'
Unfortunately, we still have the pesky “found ‘null’” to deal with. In this case, I haven’t yet found a simple way to customise the error message. Instead, it is handled as a special case. I found that in this case the exception being thrown was a MismatchedTokenException, with the text of the found token set to null. This allowed the specific case to be handled with a custom message:
{
if(mte.token.getText() == null)
{
System.err.println(“Caught error: line “ +
mte.getLine() + “:” +
mte.getColumn() +
“: end of input when expecting “ +
NotifyConditionParser._tokenNames[mte.expecting]);
}
else
{
System.err.println(“Caught error: “ + mte.toString());
}
}
This is far from an ideal solution, and I am still looking for a better alternative. However, the user experience is king, and this hack improves it:
$ java NotifyConditionParserTest "failure or (changed and success" Caught error: line 1:32: end of input when expecting a closing parenthesis ')'
DRY Violation
Those paying close attention would have noticed a wrinkle in the final ANTLR grammar: a violation of the DRY (Don’t Repeat Yourself) principle. Specifically, both the parser and tree parser share a common rule, which is repeated verbatim in the grammar file:
condition
: "true"
| "false"
| "success"
| "failure"
| "error"
| "changed"
| "changed.by.me"
| "state.change"
;
Despite scouring the ANTLR documentation, I am yet to find a way around this. I even took a look at some of the example grammars on the ANTLR website, and noticed that they suffer from a similar problem. If anyone knows a way to reuse a rule, let me know! I would love to remove the duplication.
Wrap Up
Well, that just about does it. I hope this series of posts has piqued your interest in ANTLR and parsing, and maybe even helped you to solve some of your own problems. Now go forth and parse!