ANTLR By Example: Part 1: The Language
Introduction
As promised, this is the first installment in a simple ANTLR tutorial “ANTLR By Example”. In this post, I’ll introduce the language to be implemented, and give an overview of the full tutorial.
Background
In case you missed the background, recently we added a feature to pulse that allows build notifications to be filtered using arbitrary boolean expressions. Rather than reinventing the wheel and writing a parser for the boolean expression by hand, we used ANTLR, a fantastic parser generator with support for Java (amongst several other languages). The purpose of this tutorial is to show you how easy ANTLR makes it to create your own little languages.
Overview
To make the tutorial manageable, it will be split over multiple parts:
- Part 1: The Language. An overview (you’re reading it!) and simple specification of the language to be parsed.
- Part 2: Lexical Analysis. Turning character streams into streams of tokens.
- Part 3: Parsing. Turning token streams into an Abstract Syntax Tree (AST).
- Part 4: Tree Parsers. Using ANTLR tree parsers to evaluate the AST.
- Part 5: Extra Credit. Adding the finishing touches.
The Language
Before we race into generating lexical anaylsers and parsers, it’s best that we nail down exactly what we need to parse. The language we need to process is a simple boolean expression language. The language consists of two types of “atoms” (indivisible elements): primitives and operators. These atoms are combined to form expressions.
Primitives
Primitives are simple elements that evaluate to true or false. To make the language readable, each primitive is represented as a descriptive string such as “true” or “changed.by.me”. Since we are filtering build notifications, most primitives in our language represent tests on the build result:
- true: always evaluates to true
- false: always evaluates to false
- success: evaluates to true if the build succeeded
- failure: evaluates to true if the build failed
- error: evaluates to true if the build encountered an error
- changed: evaluates to true if the build included a new change
- changed.by.me: evaluates to true if the build included a new change by the owner of the filter
- state.change: evaluates to true if the result of the build differs from the result of the previous build
The meanings of the primitives are not terribly important to the parsing, but this is a real world example after all :).
Operators
Our language supports standard boolean operators for disjunction (or), conjuction (and) and inversion (not). Again for readability, we use words to represent these operators. Additionally, a grouping operator is defined, allowing expressions to be treated as atoms (to override operator precedence). The operators are summarised below:
- or: a binary operator, a “or” b evaluates to true if either a or b evaluates to true
- and: a binary operator, a “and” b evaluates to true only if both a and b evaluate to true
- not: a unary operator, “not” a evaluates to true only if a evaluates to false
- grouping: “(” a “)” groups the expression a into an atom, overriding normal precedence rules
The operators are listed from lowest to highest precendence, i.e. later operators bind more tightly.
Expressions
Expressions are formed by composing strings of atoms. Not all atom strings are valid expressions. In particular, operators must have correct arguments (left and right for binary, right for unary, internal for grouping). The following are all examples of valid expressions:
- changed
- changed.by.me and failed
- not (success or state.change) and changed
Operators with higher precedence bind more tightly. Thus:
a “and” b “or” c
is equivalent to:
“(” a “and” b “)” “or” c
as “and” has higher precedence than “or”.
Wrap Up
So there it is, a simple language for defining boolean expressions. In the next part, we’ll fire up ANTLR and start turning character streams into token streams.
This entry was posted on Monday, June 5th, 2006 at 9:08 pm and is filed under ANTLR, Java, Programming Languages, Technology. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

October 30th, 2006 at 4:50 pm
seanstickle » Literate Ruby Source Code says:[…] In the course of this, I need to learn parsing and lexing programming. I bought the Lex and Yacc book, but I had also read things about ANTLR. So I decided to look that up as well. I found a good tutorial on ANTLR. […]
January 3rd, 2007 at 8:10 pm
Good to know about a generic parser
July 27th, 2007 at 4:11 pm
Hey, thanks very much for your article. I spent a long time banging my head against the desk trying to figure out how to parse boolean expressions using ANTLR.
Cheers
July 31st, 2007 at 12:47 am
Hi Eddie,
I’m glad to hear you got something out of it. Thanks for stopping by.
March 17th, 2010 at 10:26 pm
please help me I not understand how I work with ANTLR, please give me a support in frensh if that is possible.
think you
December 22nd, 2014 at 1:39 am
How to: antlr: is there a simple example? | SevenNet says:[…] http://www.alittlemadness.com/2006/06/05/antlr-by-example-part-1-the-language/ […]
June 9th, 2015 at 10:18 pm
Andy McMullan » Blog Archive » Using ANTLR to parse boolean queries says:[…] For a more comprehensive ANTLR tutorial, see this series of blog posts. […]