a little madness

A man needs a little madness, or else he never dares cut the rope and be free -Nikos Kazantzakis

Zutubi

ANTLR By Example: Part 1: The Language

Introduction

As promised, this is the first installment in a simple ANTLR tutorial “ANTLR By Example”. In this post, I’ll introduce the language to be implemented, and give an overview of the full tutorial.

Background

In case you missed the background, recently we added a feature to pulse that allows build notifications to be filtered using arbitrary boolean expressions. Rather than reinventing the wheel and writing a parser for the boolean expression by hand, we used ANTLR, a fantastic parser generator with support for Java (amongst several other languages). The purpose of this tutorial is to show you how easy ANTLR makes it to create your own little languages.

Overview

To make the tutorial manageable, it will be split over multiple parts:

The Language

Before we race into generating lexical anaylsers and parsers, it’s best that we nail down exactly what we need to parse. The language we need to process is a simple boolean expression language. The language consists of two types of “atoms” (indivisible elements): primitives and operators. These atoms are combined to form expressions.

Primitives

Primitives are simple elements that evaluate to true or false. To make the language readable, each primitive is represented as a descriptive string such as “true” or “changed.by.me”. Since we are filtering build notifications, most primitives in our language represent tests on the build result:

  • true: always evaluates to true
  • false: always evaluates to false
  • success: evaluates to true if the build succeeded
  • failure: evaluates to true if the build failed
  • error: evaluates to true if the build encountered an error
  • changed: evaluates to true if the build included a new change
  • changed.by.me: evaluates to true if the build included a new change by the owner of the filter
  • state.change: evaluates to true if the result of the build differs from the result of the previous build

The meanings of the primitives are not terribly important to the parsing, but this is a real world example after all :).

Operators

Our language supports standard boolean operators for disjunction (or), conjuction (and) and inversion (not). Again for readability, we use words to represent these operators. Additionally, a grouping operator is defined, allowing expressions to be treated as atoms (to override operator precedence). The operators are summarised below:

  • or: a binary operator, a “or” b evaluates to true if either a or b evaluates to true
  • and: a binary operator, a “and” b evaluates to true only if both a and b evaluate to true
  • not: a unary operator, “not” a evaluates to true only if a evaluates to false
  • grouping: “(” a “)” groups the expression a into an atom, overriding normal precedence rules

The operators are listed from lowest to highest precendence, i.e. later operators bind more tightly.

Expressions

Expressions are formed by composing strings of atoms. Not all atom strings are valid expressions. In particular, operators must have correct arguments (left and right for binary, right for unary, internal for grouping). The following are all examples of valid expressions:

  • changed
  • changed.by.me and failed
  • not (success or state.change) and changed

Operators with higher precedence bind more tightly. Thus:

a “and” b “or” c

is equivalent to:

“(” a “and” b “)” “or” c

as “and” has higher precedence than “or”.

Wrap Up

So there it is, a simple language for defining boolean expressions. In the next part, we’ll fire up ANTLR and start turning character streams into token streams.

Liked this post? Share it!

5 Responses to “ANTLR By Example: Part 1: The Language”

  1. October 30th, 2006 at 4:50 pm

    seanstickle » Literate Ruby Source Code says:

    […] In the course of this, I need to learn parsing and lexing programming. I bought the Lex and Yacc book, but I had also read things about ANTLR. So I decided to look that up as well. I found a good tutorial on ANTLR. […]

  2. January 3rd, 2007 at 8:10 pm

    sapna jain says:

    Good to know about a generic parser

  3. July 27th, 2007 at 4:11 pm

    Eddie Stanley says:

    Hey, thanks very much for your article. I spent a long time banging my head against the desk trying to figure out how to parse boolean expressions using ANTLR.

    Cheers

  4. July 31st, 2007 at 12:47 am

    Jason says:

    Hi Eddie,

    I’m glad to hear you got something out of it. Thanks for stopping by.

  5. March 17th, 2010 at 10:26 pm

    kawther says:

    please help me I not understand how I work with ANTLR, please give me a support in frensh if that is possible.
    think you

Leave a Reply