ANTLR By Example: Part 1: The Language
As promised, this is the first installment in a simple ANTLR tutorial “ANTLR By Example”. In this post, I’ll introduce the language to be implemented, and give an overview of the full tutorial.
In case you missed the background, recently we added a feature to pulse that allows build notifications to be filtered using arbitrary boolean expressions. Rather than reinventing the wheel and writing a parser for the boolean expression by hand, we used ANTLR, a fantastic parser generator with support for Java (amongst several other languages). The purpose of this tutorial is to show you how easy ANTLR makes it to create your own little languages.
To make the tutorial manageable, it will be split over multiple parts:
- Part 1: The Language. An overview (you’re reading it!) and simple specification of the language to be parsed.
- Part 2: Lexical Analysis. Turning character streams into streams of tokens.
- Part 3: Parsing. Turning token streams into an Abstract Syntax Tree (AST).
- Part 4: Tree Parsers. Using ANTLR tree parsers to evaluate the AST.
- Part 5: Extra Credit. Adding the finishing touches.
Before we race into generating lexical anaylsers and parsers, it’s best that we nail down exactly what we need to parse. The language we need to process is a simple boolean expression language. The language consists of two types of “atoms” (indivisible elements): primitives and operators. These atoms are combined to form expressions.
Primitives are simple elements that evaluate to true or false. To make the language readable, each primitive is represented as a descriptive string such as “true” or “changed.by.me”. Since we are filtering build notifications, most primitives in our language represent tests on the build result:
- true: always evaluates to true
- false: always evaluates to false
- success: evaluates to true if the build succeeded
- failure: evaluates to true if the build failed
- error: evaluates to true if the build encountered an error
- changed: evaluates to true if the build included a new change
- changed.by.me: evaluates to true if the build included a new change by the owner of the filter
- state.change: evaluates to true if the result of the build differs from the result of the previous build
The meanings of the primitives are not terribly important to the parsing, but this is a real world example after all :).
Our language supports standard boolean operators for disjunction (or), conjuction (and) and inversion (not). Again for readability, we use words to represent these operators. Additionally, a grouping operator is defined, allowing expressions to be treated as atoms (to override operator precedence). The operators are summarised below:
- or: a binary operator, a “or” b evaluates to true if either a or b evaluates to true
- and: a binary operator, a “and” b evaluates to true only if both a and b evaluate to true
- not: a unary operator, “not” a evaluates to true only if a evaluates to false
- grouping: “(” a “)” groups the expression a into an atom, overriding normal precedence rules
The operators are listed from lowest to highest precendence, i.e. later operators bind more tightly.
Expressions are formed by composing strings of atoms. Not all atom strings are valid expressions. In particular, operators must have correct arguments (left and right for binary, right for unary, internal for grouping). The following are all examples of valid expressions:
- changed.by.me and failed
- not (success or state.change) and changed
Operators with higher precedence bind more tightly. Thus:
a “and” b “or” c
is equivalent to:
“(” a “and” b “)” “or” c
as “and” has higher precedence than “or”.
So there it is, a simple language for defining boolean expressions. In the next part, we’ll fire up ANTLR and start turning character streams into token streams.
This entry was posted on Monday, June 5th, 2006 at 9:08 pm and is filed under ANTLR, Java, Programming Languages, Technology. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.