Cell ➞ Java compiler version 0.4

Version 0.4 of the Cell to Java code generator is now available. It's still a beta version, but it's a major milestone. In terms of features and optimization it's almost complete, in the sense that nearly everything that is planned for version 1.0 has now been implemented. The next release will finish the few minor features that are still missing (which for the most part are about improving the interface of the generated code and the usability of the compiler rather than the language itself) but apart from that all that separates this release from version 1.0 is just a lot of testing.

The main goal of this release was optimizing the performance of relational automata. There's still a lot of work to do there, but I'm pleased to report that the results are already pretty good. You can read more about it here.

There's also a number of features and changes that were originally planned for a later version, and that's part of the reason this release has been a bit late.

For starters, there's an initial implementation of array mutability. It's a work in progress and it comes with a lot of restrictions, but it's now at least possible, with a bit of extra effort, to efficiently code at least some algorithms that would have previously been too slow. The details are in the imperative code page.

In the previous versions of the language sets and maps were implemented as sorted arrays, which meant that the only way to add or remove elements or key-value pairs was to rebuild from scratch the entire data structure. The language now combines sorted arrays and balanced trees to provide efficient, O(log(N)) insertion and removal of elements or key-value pairs, using the following new builtin functions:

// Inserts an element into a set
[T] _insert_([T] set, T elt);

// Removes an element from a set
[T] _remove_([T] set, T elt);

// Inserts a key-value pair into a map
[K -> V] _put_([K -> V] map, K key, V value);

// Removes a key and its associated value from a map
[K -> V] _drop_([K -> V] map, K key);

Just like with all other data structures in Cell, there's a strict separation between logical definition of a data structure and its physical implementations: there's no way to detect within the language whether a set/map is implemented as a sorted array, a balanced tree or a combination of the two: they are all indistinguishable, except for their performance characteristics.

It's also possible now to pass an automaton instance as argument to another procedure:

Main(String* args) {
  my_auto : MyRelAuto;

  ...

  DoSomething(my_auto);

  ...
}

DoSomething(MyRelAuto my_auto) {
  ...

  ok = my_auto <- a_msg;

  ...
}

There have been some major syntactic changes also. The body of loops and other composite statements has now to be enclosed in braces:

for x <- xs {
  y = f(x);
  ys = (ys | y);
}

If the body consists of a single statement, you can omit the braces entirely:

for x <- xs
  if p(x)
    ys = (ys | x);

The previous syntax was always a bit weird: it was meant as a temporary stopgap before a switch to a python-like syntax, but in the end the plan to make the syntax indentation-aware was abandoned, so now the language is back to a more conventional notation. The elif keyword for if statements and expressions with multiple branches has been removed as well. For example, this is how a sign(..) function could have been written before:

## OLD SYNTAX
Int sign(Int n) {
  if n > 0:
    return 1;
  elif n < 0:
    return -1;
  else
    return 0;
  ;
}

## OLD SYNTAX
Int sign(Int n) =
  if   n > 0 then  1
  elif n < 0 then -1
             else  0;

And this is how you would write it now:

Int sign(Int n) {
  if n > 0
    return 1;
  else if n < 0
    return -1;
  else
    return 0;
}

Int sign(Int n) =
  if n > 0 then  1 else
  if n < 0 then -1 else
                 0;

There's also been a subtle change in the syntax of mutable relation variables and insert statements. Here's the old syntax:

## OLD SYNTAX
user(UserId):
  name : String;

## OLD SYNTAX
insert user(an_id):
  name = a_name;

and this is the new one:

user(UserId)
  name : String;

insert user(an_id)
  name = a_name;

If you squint a bit you'll see the column sign after the parentheses is now gone.

The last user-visible change is the removal of nested relational automata. In their current form, they were never very useful in the first place, and some improvements to wired automata coming in 0.5 would have made them completely redundant anyway. They may be resurrected in a different form at some point in the future, once other pieces of the language have fallen into place.

There have also been changes to the interface of the generated code of relational automata. The methods used for loading or saving the state of a relational automata:

  Value readState();
  void setState(String);

have been replaced with:

  void load(Reader writer);
  void save(Writer reader);

which are faster and a lot less memory hungry, and can deal with large datasets.

This release also brings a number of bug fixes.

As already mentioned, the next release (0.5) will implement those few features that are still missing for version 1.0. Expect it in October, or early November at the latest.