Archive for the ‘Programming Languages’ Category

Checked Exceptions: Failed Experiment?

Wednesday, March 12th, 2008

Checked exceptions is one of those Java features that tends to work up a lively discussion. So forgive me for rehashing an old debate, but I am dismayed to see continued rejection of them as a failed feature, and even more so to see proposals to remove them from the language (not that I ever think this will happen).

In the Beginning…

Most would agree that mistakes were made in the way checked exceptions were used in the early Java years. Some libraries even go so far as doing something about it. However, an argument against the feature itself this does not make. Like anything new, people just needed time in the trenches to learn how to get the most out of checked exceptions. A key point here is that checked exceptions are available but not compulsory. This leads to a reasonable conclusion, as made in Effective Java:

Use checked exceptions for recoverable conditions and runtime exceptions for programming errors

The key point here is recoverable. If you can and should handle the exception, and usually close to the source, then a checked exception is exactly what you need. This will encourage proper handling of an error case - and make it impossible to not realise the case exists. For unrecoverable problems, handling is unlikely to be close to the source, and handling (if any) is likely to be more generic. In this case an unchecked exception is used, to avoid maintenance problems and the leakage of implementation details.

Continued Criticism

None of the above is new, and to me it is not contraversial. But the continued attacks on checked exceptions indicate that some still disagree. Let’s take a look at some common arguments against checked exceptions that still get airtime:

Exception Swallowing/Rethrowing

Some argue that the proliferation of code that either swallows checked exceptions (catches them without handling them) or just rethrows them all (”throws Exception”) is evidence that the feature is flawed. To me this argument carries no weight. If we look at why exception handling is abused in this way, it boils down to two possibilities:

  1. The programmer is too lazy to consider error handling properly, so just swallows or rethrows everything. A bad programmer doesn’t imply a bad feature.
  2. The programmer is frustrated by a poorly-designed API that throws checked exceptions when it shouldn’t. In this case the feature is an innocent tool in the hands of a bad API designer. Granted it took some time to discover when checked exceptions were appropriate, so many older APIs got it wrong. However, this is no reason to throw out the feature now.

You need well-designed APIs and good programmers to get quality software, whether you are using checked exceptions or not.

Maintenance Nightmare

A common complaint is how changes to the exception specification for a method bubbles outwards, creating a maintenance nightmare. There are two responses to this:

  1. The exceptions thrown by your method are part of the public API. Just because this is inconvenient (and when is error handling ever convenient?) doesn’t make it false! If you want to change an API, then there will be maintenance issues - full stop.
  2. If checked exceptions are kept to cases that are recoverable, and usually close to the point when they are raised, the specification will not bubble far.

Switching every last exception to unchecked won’t make maintenance problems go away, it will just make it easier to ignore them - to your own detriment.

Try/Catch Clutter

Some argue that all the try/catch blocks that checked exceptions force on us clutter up the code. Is the alternative, however, to elide the error handling? If you need to handle an error, the code needs to go somewhere. In an exception-based language, that is a try-catch block, whether your exception is checked or not. At least the exception mechanism lets you take the error handling code out of line from your actual functionality. You could argue that the Java’s exception handling syntax and abstraction capabilities make error handling more repetitive than it should be - and I would agree with you. This is orthogonal to the checked vs unchecked issue, though.

Testing Is Better

Some would argue that instead of the complier enforcing exception handling in a rigid way, it would be better to rely on tests catching problems at runtime. This is analogous to the debate re: static vs dynamic typing, and is admittedly not a clear cut issue. My response to this is that earlier and more reliable feedback (i.e. from the compiler) is clearly beneficial. Measuring the relative cost of maintaining exception specifications versus maintaining tests is more difficult. In cases where an exception almost certainly needs to be recovered from (i.e. the only case where a checked exception should be used), however, I would argue that the testing would be at least as expensive, and less reliable.

Conclusion

I am yet to hear an argument against checked exceptions that I find convincing. To me, they are a valuable language feature that, when used correctly, makes programs more robust. As such, it is nonsense to suggest throwing them out. Instead, the focus should be on encouraging their appropriate use.

Q: What Sucks More Than Java Generics?

Wednesday, February 27th, 2008

A: Java without generics. Yes, there are many problems with the Java generics implementation. And I seriously hope that the implementation is improved in Java 7. However, it occurs to me that the problems can generally be worked around by either:

  1. Not using generics where the limitations make it too difficult; or
  2. Using unsafe casts.

In both cases, you are no worse off than you were pre-generics, where you had no choice! Despite their limitations, generics have done a lot to reduce painfully repetitive casting and mysterious interfaces (List of what now?). They also enable abstraction of many extremely common functions without losing type information. In these ways I benefit from generics every day, enough to outweigh the frustrations of erasure.

Re: Groovy Or JRuby

Friday, November 30th, 2007

Martin Fowler makes an interesting claim in GroovyOrJRuby:

A strong reason to prefer Ruby is the fact that it lives in multiple implementations.

Fowler is alluding to the fact that Groovy runs on the JVM only, whereas “Ruby can run directly on mainstream operating systems with a C runtime, and is starting to run on .NET’s CLR”. I don’t see this as a strong advantage for Ruby at all, for two main reasons:

  1. If you need portability across mainstream operating systems, you already have it with the JVM (to a similar enough extent as with Ruby).
  2. If you need tight integration with a platform, I don’t think either the JVM or CRuby implementation has a clear advantage.

When you start a project, you need to decide what tradeoff to make between portability and depth of platform integration. If you favour portability, you can just take the JVM then choose whichever language you please. If you favour depth of integration, your platform will often dictate the language. If you want both, implement the portable part on the JVM and go native for the rest.

The Key Thing In Python’s Favour

Wednesday, September 5th, 2007

I recently ran across this post advocating Python. This made me think about why I prefer Python over the rest of the scripting crowd. And I realised that there is just one key reason:

Python is easier to read

That’s it. For all the joys of programming in dynamically-typed languages, I think there is one major problem: in many ways these languages favour writers over readers. I covered one example of this problem back in my post on Duck Typing vs Readability. This is why it is so important that a dynamic language is designed with readability in mind.

In my opinion, Python really shines readability-wise. And this is no accident — if you follow design discussions about Python you will see that Guido is always concerned with code clarity. A feature is not worth adding just because it makes code more compact; it must make the code more concise. Features have even been removed from Python because they were seen to encourage compact but difficult to comprehend code (reduce being a classic example). Even controversial design choices like significant whitespace and the explicit use of “self” are actually good for readability.

The philosophy behind Python also extends beyond the language design. Clever tricks and one-liners are not encouraged by the community. Rather, the community aims for Pythonic code:

To be Pythonic is to use the Python constructs and datastructures with clean, readable idioms.

This is surely refreshing compared to the misguided boasting about how much can be crammed in to a few lines of [insert other scripting language]. Certainly, if I ever have to pick up and maintain another person’s dynamically-typed code, I hope it is Python. And since I know I will have to maintain my own code, when I go dynamic I always pick Python.

It’s Not About the Typing, Or Even the Typing

Tuesday, July 10th, 2007

There has been plenty of buzz lately about so-called “dynamic” languages. I guess we have the hype around Rails and similar frameworks to thank (blame?) for this most recent manifestation of an ages-old debate. Unfortunately, the excitement around the productivity increase (real or perceived) afforded by these languages has led to some common fallacies.

Fallacy 1: Misclassification of Type Systems

I’ve rolled a few related fallacies into one here. The most common mistakes I see are classifying static as explicit typing, and dynamic as weak typing. I won’t go into too much detail, as this one is well covered by the fallacies in What To Know Before Debating Type Systems. In any case, you can usually tell what people are getting at, and concentrate on more interesting arguments.

Fallacy 2: Less “Finger” Typing == Drastically More Productivity

This one really gets to me. When was the last time you told your customer “Sorry, I would have made that deadline: I just couldn’t type all the code fast enough!”? The number of characters I need to type has very little to do with how fast I code. This is for a few reasons:

  1. For anything but short bursts, I type a lot faster than I think. Most programming time goes into thinking how to solve the non-trivial problems that come up every day. This reason alone is enough to dispel the myth.
  2. Every second thing I type is spelled the same: Control-Space. Honestly, good IDEs do most of the typing for you, especially when they have plenty of type information to work off.
  3. Sometimes more verbosity is better anyway, otherwise we would be naming our variables a, b, c…

The only ways in which I believe less finger typing give a significant increase in productivity are when:

  • The chance to make a trivial error is removed; or
  • The reduced noise in the code makes it more readable.

Personally, I do not find static typing introduces noise, even when full explicit type declarations are required. On the contrary, I find code with type information is usually easier to read without the need to refer to external documentation (see my earlier post on Duck Typing vs Readability)1.

Fallacy 3: Dynamic Typing Is The Big Factor In Increasing Productivity

This is the end conclusion many people draw to the whole issue. I think this derives from the earlier fallacies, and misses the important points. It is true that people are finding many significant productivity improvements in dynamic languages. However, I think very few of these have anything to do with the static vs dynamic typing debate. So, where are the productivity increases? Here are a few:

  1. Frameworks: Rails is creating the latest hype around dynamic languages, because a framework of its style is a big productivity boost for a certain class of applications. However, Rails-like frameworks are possible in other languages, as demonstrated by the many clones. Not everything will be cloneable in some languages, but the important things can be and those that can’t are unlikely going to be because of static typing.
  2. Syntactic sugar: there is a correlation between languages that are dynamically typed and languages that have convenient syntax for common data structures and other operations (e.g. regular expressions). Note that I do not believe the value of syntactic sugar is in the reduced finger typing (see fallacy 2). Rather, this syntax can reduce the chance of silly errors (e.g. foreach constructs vs a classic for-loop) and can make for significantly more readable code (compare the verbosity of simple list operations in Java vs Python). There is nothing that prevents something like a literal syntax for lists from being supported in a statically-typed language, it is just not common in popular static languages.
  3. Higher-level programming constructs: this is the big one. The biggest productivity gains given by a programming language come from the abstractions that it offers. Similar to the previous point, there is a correlation between languages that use dynamic typing and those with higher-level constructs like closures, meta-programming, (proper) macros etc. However, statically-typed languages can and in some cases do support these constructs.

Let’s also not forget that dynamic typing can harm productivity in certain ways: such as less powerful tooling.

Conclusion

My point is simple: get your argument straight before you write off static typing. Even though dynamic typing is the most obviously “different” thing when you first try a language like Python or Ruby, that does not mean it is the reason things seem much more productive. In the end, allowing higher levels of abstraction and maintaining readability are far more important factors, and these are largely independent of the static vs dynamic religious war.


1 Actually, the ideal readability-wise would be to have explicit type information where the type is not immediately obvious. For example, API parameters need type information (when it is not in the code it ends up in the docs). On the other hand, most local variables do not, for example loop counters.

Incremental schema upgrades using Hibernate

Monday, August 28th, 2006

I have been inspired by recent discussions on upgrade frameworks to show how hibernate can be used to provide simple incremental database schema maintenance. Database schema maintenance is one of the more difficult aspects of upgrading applications, particularly when the application supports multiple databases, so I am very happy that hibernate helps out during upgrades.

SchemaUpdate

Hibernate provides a class called SchemaUpdate that is able to synchronise a set of hibernate mappings with a database schema. The following code snippet shows how easy it is:

// manually setup the hibernate configuration
Configuration config = new Configuration();

Properties props = new Properties();
props.put(“hibernate.dialect”, “org.hibernate.dialect.HSQLDialect”);
props.put(“hibernate.connection.provider_class”,
 “com.zutubi.pulse.upgrade.tasks.UpgradeTaskConnectionProvider”);

// slight hack to provide hibernate with access to
// the configured datasource via a static variable
// on our ConnectionProvider implementation.
UpgradeTaskConnectionProvider.dataSource = dataSource;

// use spring to help load the classpath resources.
for (String mapping : mappings)
{
  ClassPathResource resource =
                new ClassPathResource(mapping);
  config.addInputStream(resource.getInputStream());
}

// run the schema update.
new SchemaUpdate(config, props).execute(true, true);

This example uses the spring ClassPathResource to load the mappings file from the classpath, and the UpgradeTaskConnectionProvider to inject a datasource into the process.

.hbm.xml fragments

This by itself is not overly interesting. What people usually do not realise is that the mappings files do not need to hold your entire schema. When making incremental changes to your schema, all you need in the mappings are those incremental changes. This comes in very handy when you have lots of mappings to manage.

For example. You have the following mapping of a user:

<class name=“com.zutubi.pulse.model.User” table=“USER”>

    <id name=“id” type=“java.lang.Long” column=“ID”>
        <generator class=“hilo”/>
    </id>

    <property name=“login” column=“LOGIN” type=“string”/>
    <property name=“name” column=“NAME” type=“string”/>

</class>

Some time later, you want to store a password field with this user. By passing the following mapping to the SchemaUpdate, it will add that column to your existing table, leaving the existing schema as it is.

<class name=“com.zutubi.pulse.model.User” table=“USER”>

  <id name=“id” type=“java.lang.Long” column=“ID”>
    <generator class=“hilo”/>
  </id>
       
  <property name=“pass” column=“PASS” type=“string”/>

</class>

You still need to ensure that the mapping file is valid, hence the inclusion of the ID field in the second mapping.

Versioning

So, to support incremental schema upgrades within your application, you will need to keep two sets of hibernate mapping files. The first will be the latest version of your mappings. This is what is used for new installations. The second will be a set of versioned mapping fragments as described above.

You will need to version them so that you can track which fragments you need to apply and in which order, based on the version of the schema you are upgrading from. I use directory names like build_010101 to store my schema fragments and a properties file to store the current schema version. Other people use a special table in the database to hold the current schema version. Use which ever is most appropriate to your situation.

Generating upgrade SQL

For those of you that do not want or can not allow Hibernate to run the schema update, you can use the following code to generate the SQL that Hibernate would otherwise execute:

Dialect dialect = Dialect.getDialect(props);
Connection connection = dataSource.getConnection();
DatabaseMetadata meta =
    new DatabaseMetadata(connection, dialect);
String[] createSQL =
    config.generateSchemaUpdateScript(dialect, meta);

This code would replace the last line in the first example.

Things to remember about SchemaUpdate

Okay, so just a couple of final things to be aware of with hibernates schema update.

The hibernate schema update will:

  • create a new table
  • add a new column

The hibernate schema update will not:

  • drop a table
  • drop a column
  • change a constraint on a column
  • add a column with a not-null constraint to an existing table

Final tip

Oh, and the class name that you provide in the update mapping can be anything you want. It is not checked, which is great, otherwise you would need to handle versioning of your class files as well.

Happy upgrading!

If Java Could Have Just One C++ Feature…

Thursday, August 3rd, 2006

I have been immersed in Java for a while now, but having worked in C++ for years before, there is one big thing I miss: destructors. Especially in a language with exceptions, destructors are a massive time and error saver for resource management.

Having garbage collection is nice and all, but the fact is that we deal a multitude of resources and need to collect them all. How do we do this in Java? The Hard Way: we need to know that streams, database connections etc need to be closed, and we need to explicitly close them:

FileInputStream f = new FileInputStream(“somefile”);
// Do some stuff.
f.close();

Of course, with exceptions it gets worse. We need to guarantee that the stream is closed even if an exception is thrown, leading to the oft-seen pattern:

FileInputStream f = null;
try
{
    // Do some stuff
}
finally
{
    if(f != null)
    {
        try
        {
            f.close();
        }
        catch(IOException e)
        {
            // Frankly, my dear…
        }
    }
}

The noise is just incredible. A common way to reduce the noise is to use a utility function to do the null check and close, but noise still remains. Repeating the same try/finally pattern everywhere is also mind-numbing, and it can be easily forgotten leading to incorrect code.

In C++, this problem is solved elegantly using the Resource Acquisition Is Initialisation (RAII) pattern. This pattern dictates that resources should be acquired in a constructor and disposed of in the corresponding destructor. Combined with the deterministic destruction semantics for objects placed on the stack, this pattern removes the need for manual cleanup and with it the possbility of mistakes:

{
    std::ifstream f(“somefile”);
    // Do some stuff
}

Where has all the cleanup gone? It is where it should be: in the destructor for std::ifstream. The destructor is called automatically when the object goes out of scope (even if the block is exited due to an uncaught exception). The ability to create value types and place them on the stack is a more general advantage of C++, but Java can close the gap with smarter compilers1.

Interestingly, C# comes in half way between Java and C++ on this matter. In C#, you can employ a using statement to ensure cleanup occurs:

using (TextReader r = File.OpenText(“log.txt”)) {
    // Do some stuff
}

In this case the resource type must implement System.IDisposable, and IDispose is guaranteed to be called on the object at the end of the using statement. The using statement in C# is pure syntactic sugar for the try/finally pattern we bash out in Java every day.

What’s the answer for Java?2 Well, something similar to using would be a good start, but I do feel like we should be able to do better. If we’re going to add sugar why not let us define our own with a full-blown macro system? Difficult yes, but perhaps easier than always playing catch up? An alternative is to try and retrofit destructors into the language3. It is possible to mix both garbage collection and destructors, as shown in C++/CLI4. However, I don’t see an elegant way to do so that improves upon what using brings. If you do, then let us all know!


1 it appears that Mustang already has some of the smarts such as escape analysis.
2 if you’re the one down the back who shouted “finalizers”: you can leave anytime you want as long as it’s now!
3 I said NOW!
4See also Herb Suttor’s excellent post on the topic Destructors vs. GC? Destructors + GC!.

ANTLR By Example: Part 5: Extra Credit

Tuesday, July 11th, 2006

Introduction

Over the past four parts, I have illustrated how to parse and evaluate boolean expressions using ANTLR. The grammar presented is in those parts is based on real code in pulse. Although it works as presented, there are a couple of items to polish up, one of which I have solved, and the other of which I have not yet been able to solve.

Error Reporting

As pulse allows users to enter their own boolean expressions (to configure when they receive build notifications), decent error reporting is paramount. The first step is to turn off ANTLR’s default error handling, so that the errors can be handled by pulse. This is done by setting the defaultErrorHandler option to false:

class NotifyConditionParser extends Parser;

options {
        buildAST=true;
        defaultErrorHandler=false;
}

With that done, the ANTLR-generated code will throw exceptions on errors. Let’s take a look at the sorts of errors that are generated by the grammar as it stands.

Case 1: Unrecognised word:

$ java NotifyConditionParserTest "changed or tuer"
Caught error: unexpected token: tuer

Case 2: Unrecognised character:

$ java NotifyConditionParserTest "6 and false"
Caught error: unexpected char: '6'

Case 3: Illegal expression structure

$ java NotifyConditionParserTest "state.change or or success"
Caught error: unexpected token: or

Case 4: Unbalanced parentheses

$ java NotifyConditionParserTest "failure or (changed and success"
Caught error: expecting RIGHT_PAREN, found 'null'

Most of these messages are not too bad, at least they are on the right track. Case 4 is certainly the worst of the lot, although the information is accurate it is not exactly user friendly. We’ll get back to that later. One big thing missing in all cases is location information. I figured that ANTLR must have a way to retrieve the information, and a little digging uncovered it. All of the above messages are generated using the getMessage method of the exceptions thrown by ANTLR. To get the line and column number information (which is indeed stored in the exception), you can use the toString method instead:

try
{
    …
}
catch (Exception e)
{
    System.err.println(“Caught error: “ + e.toString());
}

Trying case 1 again:

$ java NotifyConditionParserTest "changed or tuer"
Caught error: line 1:12: unexpected token: tuer

Much better! Now the user knows where the error occured. That leaves us with case 4, which is still a little on the cryptic side:

$ java NotifyConditionParserTest "failure or (changed and success"
Caught error: expecting RIGHT_PAREN, found 'null'

It would be nice if we could not expose the raw token names (e.g. RIGHT_PAREN) and also explicitly say we hit the end of the input (instead of “found ‘null’”). To fix the former problem, we can add paraphrase options to our lexer tokens. This allows us to specify a phrase describing the token which will be used in error messages instead of the token name. The options are applied in the grammar file as part of the lexer rules, for example:

RIGHT_PAREN
options {
    paraphrase = "a closing parenthesis ')'";
}
    : ')';

Applying the paraphrases improves the error message considerably:

$ java NotifyConditionParserTest "failure or (changed and success"
Caught error: line 1:32: expecting a closing parenthesis ')', found 'null'

Unfortunately, we still have the pesky “found ‘null’” to deal with. In this case, I haven’t yet found a simple way to customise the error message. Instead, it is handled as a special case. I found that in this case the exception being thrown was a MismatchedTokenException, with the text of the found token set to null. This allowed the specific case to be handled with a custom message:

catch(MismatchedTokenException mte)
{
    if(mte.token.getText() == null)
    {
        System.err.println(“Caught error: line “ +
            mte.getLine() + “:” +
            mte.getColumn() +
            “: end of input when expecting “ +
            NotifyConditionParser._tokenNames[mte.expecting]);
    }
    else
    {
        System.err.println(“Caught error: “ + mte.toString());
    }
}

This is far from an ideal solution, and I am still looking for a better alternative. However, the user experience is king, and this hack improves it:

$ java NotifyConditionParserTest "failure or (changed and success"
Caught error: line 1:32: end of input when expecting a closing parenthesis ')'

DRY Violation

Those paying close attention would have noticed a wrinkle in the final ANTLR grammar: a violation of the DRY (Don’t Repeat Yourself) principle. Specifically, both the parser and tree parser share a common rule, which is repeated verbatim in the grammar file:

condition
    : "true"
    | "false"
    | "success"
    | "failure"
    | "error"
    | "changed"
    | "changed.by.me"
    | "state.change"
    ;

Despite scouring the ANTLR documentation, I am yet to find a way around this. I even took a look at some of the example grammars on the ANTLR website, and noticed that they suffer from a similar problem. If anyone knows a way to reuse a rule, let me know! I would love to remove the duplication.

Wrap Up

Well, that just about does it. I hope this series of posts has piqued your interest in ANTLR and parsing, and maybe even helped you to solve some of your own problems. Now go forth and parse!

ANTLR By Example: Part 4: Tree Parsing

Thursday, July 6th, 2006

Introduction

By the end of Part 3: Parsing, we had a working parser that could understand our expression language. The parser turns an input string into an Abstract Syntax Tree (AST) representing the expression. The final piece of the puzzle is evaluating the expression for a given build result (remember, these expressions are used to determine when to send build notifications). To do this we will make use of an ANTLR tree parser.

NotifyCondition’s

Before we get into the details of tree parsers, let’s first define the end goal. Currently we have a way to generate an AST for an expression. What we need, however, is a way to evaluate the expression for a given build result. Approaching from this angle, we can use a self-evaluating data structure that takes a build result and returns true if a notification should be sent for this result. The NotifyCondition interface serves as a basis for this data structure:

public interface NotifyCondition
{
    public boolean satisfied(BuildResult result, User user);
}

This interface defines just one method, satisfied, which returns true iff a notification should be sent for the given build result. The additional user argument is required to evaluate the “changed.by.me” condition. Implementations of this interface mirror the primitive conditions in our expression language:

  • ChangedByMeNotifyCondition
  • ChangedNotifyCondition
  • ErrorNotifyCondition
  • FailureNotifyCondition
  • FalseNotifyCondition
  • StateChangeNotifyCondition
  • SuccessNotifyCondition
  • TrueNotifyCondition

In addition, compound expressions (i.e. those involving operators) are represented using two further implementations:

  • CompoundNotifyCondition: combines a list of child NotifyConditions either conjunctively (and) or disjunctively (or).
  • NotNotifyCondition: inverts a single child NotifyCondition

Using these implementations of the NotifyCondition interface, we can create the self-evaluating data structure we need to test build results. The missing link is getting from the AST to the right combination of NotifyConditions. To help create NotifyConditions based on expressions, a simple factory is used:

class NotifyConditionFactory
{
    …

    public NotifyCondition createCondition(String condition)
    {
        …
    }
}

The factory takes a primitive condition string (e.g. “state.change”) and returns a new instance of the corresponding condition (e.g. StateChangeNotifyCondition).

Tree Parsers

A tree parser is a special kind of ANTLR parser that operates on an AST, rather than a token stream. This seems strange at first, as trees are a two-dimensional structure, not a single-dimansional stream. However, the principles are much the same: tree parsers just have a special rule syntax for matching input in two dimensions. In our case, we use the tree parser as a simple, declarative way to transform our AST into our own data structure representing the expression.

Defining the Tree Parser

Tree parsers are defined in grammar files, in our case in the same file: NotifyCondition.g. From the ANTLR manual:

{ optional class code preamble }

"class" YourTreeParserClass "extends TreeParser";

options

tokens

{ optional action for instance vars/methods }

tree parser rules...

The tree parser rule definitions should look very familiar:

rulename
    ":" alternative_1
    "|" alternative_2
    ...
    "|" alternative_n
    ";"

The difference is in how the alternatives are specified. ANTLR uses a special syntax that allows matching against trees:

"#(" root-token child1 child2 ... childn ")"

This will match against a tree (including subtrees) that has the given token as its root node, and has children matching the “childx” subrules. In a simple case, the child subrules are just token references, for example:

#( "and" "changed" "success" )

would match the tree:

    changed
and
    success

More generally, however, we can define the children recursively in terms of the tree parser rules themselves. This allows us to match against the ASTs generated by our parser:

class NotifyConditionTreeParser extends TreeParser;

cond
    : #("and" cond cond)
    | #("or" cond cond)
    | #("not" cond)
    | condition
    ;

condition
    : "true"
    | "false"
    | "success"
    | "failure"
    | "error"
    | "changed"
    | "changed.by.me"
    | "state.change"
    ;

Notice how the “cond” rule is defined recursively: children of an “and” expression can themselves be arbitrary conditions. In fact, this is more permissive than our parser, but that keeps the tree rules simpler.

Evaluating the Tree

The tree parser we defined does not yet do anything useful. To transform the AST into the corresponding NotifyCondition structure we need to embed actions into the tree rules. These actions can be used to insert code into the tree parser methods that conver the matched subtrees into corresponding NotifyCondition’s. Let’s dive right into the full tree parser definition:

class NotifyConditionTreeParser extends TreeParser;

{
    private NotifyConditionFactory factory;

    public void setNotifyConditionFactory(NotifyConditionFactory factory)
    {
        this.factory = factory;
    }
}

cond returns [NotifyCondition r]
{
    NotifyCondition a, b;
    r = null;
}
    : #("and" a=cond b=cond) {
        r = new CompoundNotifyCondition(a, b, false);
    }
    | #("or" a=cond b=cond) {
        r = new CompoundNotifyCondition(a, b, true);
    }
    | #("not" a=cond) {
        r = new NotNotifyCondition(a);
    }
    | c:condition {
        r = factory.createCondition(c.getText());
    }
    ;

condition
    : "true"
    | "false"
    | "success"
    | "failure"
    | "error"
    | "changed"
    | "changed.by.me"
    | "state.change"
    ;

The first thing you’ll notice is a new function setNotifyConditionFactory that allows us to inject a factory into the parser. Next, you can see that the “cond” rule has been updated to return a NotifyCondition, from a variable named “r”. The syntax looks odd at first, but it is just a way to add Java code to the method that processes the rule. Likewise, we declare NotifyCondition variables “a” and “b”. Using the “cond=a” syntax, we can assign the result of the recursive invocations of the “cond” rule to these variables. Finally, we define the NotifyCondition to return for each alternative by embedding Java code to assign the value of “r”. I have skimmed over the details somewhat here, but you can always take a look at the NotifyConditionTreeParser code generated by ANTLR to help make the connection.

Generating the Tree Parser

OK, time to fire up ANTLR again:

$ java -classpath antlr-2.7.6.jar antlr.Tool NotifyCondition.g

We get a full 10 files:

NotifyConditionLexer.java
NotifyConditionLexer.smap
NotifyConditionParser.java
NotifyConditionParser.smap
NotifyConditionParserTokenTypes.java
NotifyConditionParserTokenTypes.txt
NotifyConditionTreeParser.java
NotifyConditionTreeParser.smap
NotifyConditionTreeParserTokenTypes.java
NotifyConditionTreeParserTokenTypes.txt

The tree parser itself is, of course, defined in NotifyConditionTreeParser.java. We can put the lexer, parser and tree parser together to transform an input string into a self-evaluating NotifyCondition:

public NotifyCondition getNotifyCondition(String expression)
{
    NotifyCondition notifyCondition = null;
    StringReader reader = null;

    try
    {
        reader = new StringReader(expression);
       
        NotifyConditionLexer lexer;
        NotifyConditionParser parser;
        NotifyConditionTreeParser tree;

        lexer = new NotifyConditionLexer(reader);
        parser = new NotifyConditionParser(lexer);
        tree = new NotifyConditionTreeParser();
        tree.setNotifyConditionFactory(notifyFactory);

        parser.orexpression();
        AST t = parser.getAST();       
        notifyCondition = tree.cond(t);
    }
    catch (Exception e)
    {
        System.stderr.println(e.getMessage());
    }
    finally
    {
        if(reader != null)
        {
            reader.close();
        }
    }

    return notifyCondition;
}

That’s it! We have reached the end goal. Now when a build completes, we can pass the build result to the NotifyCondition object to determine if a notification should be sent.

Wrap Up

Well, it took a few steps, but we have finally transformed our input into our desired output. Hopefully this has shown some of the power of ANTLR, and will encourage you to take a closer look. In the final part, we will look into minor improvements to add some polish to the implementation.

Duck Typing vs Readability

Wednesday, June 21st, 2006

Duck typing is getting a lot of attention lately, probably due to the hype buzzing around about Ruby/Rails. My first encounter with duck typing was in Python, which is very similar to Ruby in this respect.

The first thing that scares you about duck typing is the lack of the compiler safety net. You can write non-sensical code and you have no idea until you run it. However, as you learn to take advantage of this newfound flexibility, you start seeing the static typing safety net as a straight-jacket. Sure it stops you from poking your eye out, but a lot of the time it just gets in your way. With disciplined testing, you can restore a large proportion of the safety net, and you feel comfortable again. It’s a classic tradeoff: safety/early error detection versus flexibility, and there’ll always be arguments for both sides1.

The next thing that I grappled with was the lack of precise contracts between callers and callees. In a statically-typed language, the function prototype specifies the contract with a high level of precision. The caller knows the contract that each of the parameters must satisfy, and that the returned value(s) will fulfill. In a duck-typed language, the prototype is much less informative. Without type information in the prototype, the only way to know the contracts for sure is to analyse the body of the function. Clearly this is unacceptable, especially when dealing with a third-party API. The usual way to mitigate this loss of information in the prototype is to provide the details in the API documentation. This approach, however, suffers from two major problems:

  • Verbosity: a static type (say a Java interface) is a concise way to specify type requirements, a natural language description will tend to be less direct.
  • Inaccuracy: there is no way to ensure the documented requirements are correct. In particular, as the code evolves there is a real danger the documentation will be left behind.

This readability problem is my biggest issue with duck typing in practice today. A commonly-suggested solution to the problem is some form of optional static type checking2. However, this route tends to lead us back to something like interfaces, which as I say are a pretty concise way to specify a contract. This is giving away too many of the advantages of duck typing, in particular:

  • Granularity: a duck-typed function places the least possible requirements on the passed parameters. Interfaces, on the other hand, may carry extra requirements: methods that are not required for the function in question. Although you can break interfaces down into atoms and combine them, the resulting number of interfaces would be overwhelming.
  • Adaptability: related to the above, a duck-typed function can be adapted to contexts that the function author may never have considered with as little effort as possible on the part of the caller.
  • Sheer convenience: there is no extra baggage required of calling code, you can just get on with the job.

So how do we get the convenience and power of duck typing without this readbility problem? What we need is a concise way to communicate the requirements on function parameters, without requiring them to be manually specified. Is this really so hard? Imagine a tool that analysed the body of a function (and potentially the functions it calls) to see how the parameters were used. Such a tool could extract a lot of useful information, such as what methods are called on the parameter. On the surface, it is not even a difficult tool to write3. Having this information available as you write code would be a huge plus. On the caller side, you know a lot more about the contract you need to fulfill. On the callee side, you no longer need to maintain this “type” information in the function documentation.

The idea is simple enough that I’m sure it has been thought of before. I wonder then, does such a tool exist? If not, are there some killer implementation difficulties I have overlooked?


1 C++ templates, although not without their own problems, get close to a best-of-both worlds: flexible contracts that are statically checked.
2 I suspect these suggestions often come from those who are more comfortable in a statically-typed world.
3 Famous last words, I suspect.


Into continuous integration? Want to be? Try pulse.