An introductory example

The purpose of this chapter is to show a toy example of what is one of the most obvious uses of Cell, namely as an alternative to an embedded SQL database. It shows some basic features of relational automata, but it's not meant to be a comprehensive introduction to them. It's also a work in progress, and feedback and suggestions on how to improve it are always appreciated.

Building a rather weird social network desktop application

Eleven-years-old Miki is a very smart boy, who has learned to program at a very young age, and is now proficient in several programming languages. Miki has a lot of friends, and he wants to build a desktop application to keep track of what his friends and all the other kids of his age in his small town are up to. It's going to be for now a single-user application, with Miki and his younger brother doing all the required data entry, and Miki has decided to build it with C# and .NET. For the data storage part, Miki would normally use a MS Access database, but this time he's going to try something different.

Like a growing number of software developers these days, Miki is very excited about functional programming and all the benefits it brings to software development. Unlike most developers though, Miki has also a very solid understanding of the relational model and fully appreciates what makes it a much better data model than what both imperative and functional programming languages provide, but he's very frustrated by all the limitations of SQL databases, like the inability to define custom data types; limited or no programmability; and of course the need to do object-relational mapping.

But Miki recently heard about a new programming language called Cell that combines functional programming with the relational model, and includes a number of features that are typically found in database management systems, like support for persistence and transactions, and has decided to give it a try.

The first thing Miki wants to keep track of in his social network is the list of all his friends and acquaintances, and for each of them he wants to store name, surname, nickname and optionally their date of birth. In SQL, he would create the following table:

CREATE TABLE PEOPLE(
  ID              INT           NOT NULL,
  NAME            VARCHAR(40)   NOT NULL,
  SURNAME         VARCHAR(40)   NOT NULL,
  NICKNAME        VARCHAR(40)   NOT NULL,
  DATE_OF_BIRTH   DATE,
  PRIMARY KEY (ID),
  UNIQUE (NICKNAME)
);

Note that the NICKNAME field is declared unique: no two people can have the same nickname (that's admittedly a bizarre requirement). All fields are mandatory, except for DATE_OF_BIRTH, which may be unknown at the time data is entered, and is therefore optional.

In order to create something equivalent in Cell, Miki needs to define a relational automaton. At first glance, a relational automaton looks a bit like a class in object-oriented languages: it combines a (mutable) data structure with the code that manipulates it. The similarities end there, though. One of the major differences is that you can make use of relations when defining the (type of the) state of a relational automaton. This is the definition of an automaton that is more or less equivalent, in terms of information content, to the above SQL table:

type Person = person(Int);

schema SocialNetwork {
  next_id : Int = 0;

  person(Person):
    name            : String,
    surname         : String,
    date_of_birth?  : Date;

  nickname(Person, String) [key: 0, key: 1];
}

The first difference is the presence of the Person type definition. In the PEOPLE table, each person is identified by an integer number (stored in the ID field), as usually happens when defining a schema in a relational database. Another common option is to use a string. In Cell, on the other hand, it's better to define a custom type like Person in order to identify each entity in your application domain. Doing so provides a modicum of type safety, makes the code more readable, and most importantly allows us to define functions and methods that are polymorphic on the entity type.

The schema declaration that follows is how you define the (type of the) state of a relational automaton. The above definition contains a lot of syntactic sugar. Without it, it would look like this:

schema SocialNetwork {
  next_id : Int = 0;

  person(Person);

  name(Person, String) [key: 0];
  surname(Person, String) [key: 0];
  date_of_birth(Person, Date) [key: 0];
  nickname(Person, String) [key: 0, key: 1];
}

Let's examine the "unsweetened" version first. next_id is a member variable, of type Int. Every time we need to generate a new Person identifier we just read its value and increment it. Nothing new there. The definitions that follow are more interesting. In the SQL version of the schema, we have a single table that stores all the attributes of a person. In Cell, every attribute is stored in a separate relation (The term 'relation' is just another name for what in SQL is called a table. For a more formal definition, check the Wikipedia entries for binary and n-ary relations). The first relation, person, is a unary relation, that is, a set. It simply contains the identifiers of all people whose data is stored in the database. The next four relations, name, surname, nickname and date_of_birth, are binary relations, each of which stores a single attribute of the person entity. That is, while in SQL all the information about a person is stored in a single record of the PEOPLE table like, for example, ID = 0, NAME = "Edward", SURNAME = "Harris", NICKNAME = "ed", DATE_OF_BIRTH = 30/8/2006, the same data in Cell is split among five different tuples/entries, person(0) in person, person(0), "Edward" in name, person(0), "Harris" in surname and so on. Note that while in SQL each column/argument of a relation is named, in Cell arguments are only identified by their position.

The reasons for splitting into atomic pieces the information that in an SQL database would be contained in a single record/row will be discussed elsewhere, but one very marginal benefit is that it makes accessing the attributes of a given entity syntactically more convenient. As an example, this is how you can define a method that generates the complete name of a person by concatenating their name and surname:

using SocialNetwork {
  String full_name(Person p) = name(p) & " " & surname(p);
}

The syntax name(p) is not that different from the p.name() or p.name syntax used in conventional languages, and it's certainly terser than the SELECT NAME FROM PEOPLE WHERE ID = AN_ID used in SQL.

Methods in Cell are just normal functions which are given access to the state of an automaton. They are declared inside a using block, as shown above. Such a block can of course contain any number of methods. Functions and methods in Cell have no side effects, just like in any other pure functional language. They do not modify the state of the program, and they cannot do any I/O: their only purpose is to compute and return a value. The state of an automaton can be mutated, as we'll see later, but any code that does so is kept separate from the rest.

The [key: 0] in the declaration of name, surname and date_of_birth just states that values in the first column of those binary relation have to be unique. In other words, you can only associate a single name, surname or date of birth to each person identifier. The nickname relation has two keys, one for each column, which means that not only each person can only have a single nickname, but also each nickname has to be unique, that is, it can be associated to only one person.

Being able to declaratively enforce the uniqueness of an attribute is one of the many advantages provided by the relational model. Another, more important one, is that you can efficiently search your dataset on any attribute without having to write any extra code. If you wanted, for instance, to write a methods that returns the identifiers of all people born on a given date, you could do it like this:

using SocialNetwork {
  [Person] people_born_on(Date d) = [p : p <- date_of_birth(?, d)];
}

The date_of_birth(?, d) expression iterates (loosely speaking) through all the entries in date_of_birth whose second argument is equal to d. The first result is produced in constant time, O(1), (it's basically a hashtable lookup) and retrieving each subsequent result only require an additional (small) constant time. Searches can be performed, with the same efficiency, on any attribute of the relation, or any combination of them in the case of ternary relations, which we'll examine later.

Let's now go back to the syntactically sugared version of the schema/automaton. The following declaration:

person(Person):
  name            : String,
  surname         : String,
  date_of_birth?  : Date;

is just a more convenient way of writing this:

person(Person);

name(Person, String) [key: 0];
surname(Person, String) [key: 0];
date_of_birth(Person, Date) [key: 0];

Note that the date_of_birth field is followed by a question mark, which qualifies it as optional. In SQL, an optional field is a field whose value can be NULL. In Cell, on the other hand, each attribute is stored in its own relation, so if, for example, you don't know the date of birth of a particular person all you need to do is to avoid inserting an entry for them in the date_of_birth relation. At the moment though there's no way to declaratively enforce the fact that a given attribute must be present, so in fact all attributes are optional, even those like name or surname that are supposed to be mandatory. This will be fixed in version 0.3 of the language, which will implement foreign keys and other missing integrity constraints.

You can also declare "multivalued" attributes using the syntactically sugared notation. In the following code, for example:

person(Person):
  name            : String,
  surname         : String,
  date_of_birth?  : Date,
  phone_number*   : String;

the phone_number attribute can have any number of values (including zero). That's rewritten by the compiler into a binary relation with no keys, which can have multiple entries for the same Person value:

phone_number(Person, String);

You can also tag an attribute with a + sign, which declares it as a mandatory multivalued attribute, that is, one that has at least one value:

person(Person):
  name            : String,
  surname         : String,
  date_of_birth?  : Date,
  phone_number+   : String;

but since as explained before there's no support yet for enforcing the presence of an attribute in the schema declaration, right now the + tag has exactly the same effect as *. Again, support for this feature will be implemented in version 0.3 of the compiler.

The nickname attribute is declared in both cases using the standard verbose syntax because there's no syntactic sugar yet for unique attributes, but that too is going to be implemented soon.

The kids in Miki's town have organized themselves into a number of gangs, and Miki wants to keep track of that in his application. Each gang has a cool name and a leader, and each kid is free to join any number of gangs. Every member of a gang has a rank and an official start date. This is what the new tables would look like in SQL:

CREATE TABLE GANGS(
  ID        INT           NOT NULL,
  NAME      VARCHAR(40)   NOT NULL,
  LEADER    INT           NOT NULL,
  PRIMARY KEY (ID),
  FOREIGN KEY (LEADER) REFERENCES PEOPLE(ID)
);

CREATE TABLE MEMBERS(
  PERSON_ID   INT     NOT NULL,
  GANG_ID     INT     NOT NULL,
  JOINED_ON   DATE    NOT NULL,
  RANK        INT     NOT NULL,
  PRIMARY KEY (PERSON_ID, GANG_ID)
);

and this is the Cell schema expanded with the new information:

type Person = person(Int);
type Gang   = gang(Int);

schema SocialNetwork {
  next_id : Int = 0;

  person(Person):
    name          : String,
    surname       : String,
    date_of_birth : Date;

  nickname(Person, String) [key: 0, key: 1];

  gang(Gang):
    name    : String,
    leader  : Person;

  member(Person, Gang):
    joined_on : Date,
    rank      : Int;
}

The Gang type, and the gang, name and leader relations are very similar to what we've seen before and we will not discuss them any further. The member binary relations, and its two attribute relations joined_on and rank on the other hand are sort of new. Without any syntactic sugar, they would look like this:

// Person #0 is a member of gang #1
member(Person, Gang);

// Person #0 joined gang #1 on #2
joined_on(Person, Gang, Date) [key: 0:1];

// Person #0 has rank #2 inside gang #1
rank(Person, Gang, Date) [key: 0:1];

The comment that precedes the declaration of each relation is the informal relation predicate, that is, the informal meaning of each entry/tuple in the relation, with #0, #1 and #2 being the placeholders for the first, second and third argument of the relation respectively.

Each tuple in the binary relation member encodes what in an E/R model would be called a relationship between two entities, namely the fact that a specific person belongs to a specific gang. It's a many-to-many relationship, with every person being able to join different gangs, and gangs having any number of members. Just like attribute relations like name or date_of_birth it can be searched efficiently on any attribute: if, for example, p is a variable of type Person and g of type Gang, the expression member(p, ?) will (loosely speaking) iterates through all the tuples whose first argument equals p, and member(?, g) through all the tuples whose second argument equals g.

The two ternary relations joined_on and rank store what we can regard as an attribute of the "membership" relation, as opposed to an attribute of an entity like a person or a gang. The first two arguments form a key for both relations, meaning that for each combination of person and gang there can be only one start date and one rank, but a person can still, for instance, have one rank in one gang and a different one in another. The fact that attributes like these cannot be "attached" to any single entity makes them awkward to model in conventional languages, which use records as their primary "composite" data structure, but can be represented very elegantly using relations. Accessing the value of one such attribute is very similar to accessing the attributes of an entity, like nickname or leader: the expression rank(p, g) returns the rank of person p inside gang g (and of course will fail if p never joined g). Similarly, ternary relations can be searched on any argument or combination thereof: iterating through joined_on(?, g, d), for instance, will produce a list of all people who joined gang g on day d. Lookups are performed in constant time (O(1)), and the same goes for retrieving the first result in any search (each additional result will of course require an extra, but small, constant time).

The last thing Miki wants to store in his dataset is a map of who knows whom among all the kids of his age in town, along with the day they met and the identity of the person who introduced them, if any. In SQL he would be creating the following table:

CREATE TABLE KNOW_EACH_OTHER(
  PERSON_ID_1   INT     NOT NULL,
  PERSON_ID_2   INT     NOT NULL,
  MET_ON        DATE    NOT NULL,
  INTRODUCED_BY INT,
  PRIMARY KEY (PERSON_ID_1, PERSON_ID_2),
  FOREIGN KEY (PERSON_ID_1)   REFERENCES PEOPLE(ID),
  FOREIGN KEY (PERSON_ID_2)   REFERENCES PEOPLE(ID),
  FOREIGN KEY (INTRODUCED_BY) REFERENCES PEOPLE(ID)
);

and this is the final version of the SocialNetwork schema in Cell:

type Person = person(Int);
type Gang   = gang(Int);

schema SocialNetwork {
  next_id : Int = 0;

  person(Person):
    name           : String,
    surname        : String,
    date_of_birth? : Date;

  nickname(Person, String) [key: 0, key: 1];

  know_each_other(Person, Person):
    met_on          : Date,
    introduced_by?  : Person;

  gang(Gang):
    name    : String,
    leader  : Person;

  member(Person, Gang):
    joined_on : Date,
    rank      : Int;
}

The new relations (know_each_other, met_on and introduced_by) look similar to the member, joined_on and rank ones, but one important difference is that in this case the relationship is intrinsically a symmetric one: if person A knows person B then person B also knows person A. Therefore the fact that A and B know each other could be stored inside know_each_other either as the (A, B) tuple, or as (B, A), or even both of them, and that's of course a problem, because it forces you to check both possibilities if you want to find out if A and B know each other, and also forces you to try to delete both if you want to remove that particular fact from your dataset. The problem is even worse for met_on and introduced_by, since they could end up containing two different values of the attribute for any given pair of people. A better support for symmetric relations is scheduled again for version 0.3 of the language.

Messages and message handlers

Now that Miki has defined the core data structures of his SocialNetwork automaton it's time for him to start writing some actual code. He's already written a couple of read-only methods:

using SocialNetwork {
  String full_name(Person p) = name(p) & " " & surname(p);

  [Person] people_born_on(Date d) = [p : p <- date_of_birth(?, d)];
}

but code that mutates the state of an automaton is quite different. The only way to mutate the state of a relational automaton is by sending it a message. A message is just a value, that is, a piece of data. For every type of message an automaton can accept there's a corresponding message handler that is invoked when such a message is received. The message handler is where the actual updating of the state of the automaton instance takes place.

The first message and message handler pair that Miki defines for his SocialNetwork automaton add a new person's data to the dataset:

type AddPerson = add_person(
  id:              PersonId,
  name:            String,
  surname:         String,
  nickname:        String,
  date_of_birth?:  Date
);

SocialNetwork.AddPerson {
  id = self.id;
  insert person(id), name(id, self.name), surname(id, self.surname), nickname(id, self.nickname);
  if self.date_of_birth?:
    insert date_of_birth(id, self.date_of_birth);
  ;
}

The first of the two above declarations defines the type of the message (which is just a tagged record containing all the information about a new person) and the second one defines the message handler. The message handler itself is pretty straightforward, just a bunch of insert statements, one of which is conditional since date_of_birth is optional. The only thing to note is the self keyword that is used to access the value of the message.

Miki has also defined similar methods to add new gangs, and record the facts that people have met or that they have joined some gang, but we will not examine them here since they're pretty similar to the one above. A more interesting message handler is the one that is used to delete a person from the database, since its logic is a bit more sophisticated. Here's the code:

SocialNetwork.delete_person(id: Person) {
  id = self.id;

  // Removing the person's data from person, member and
  // all their attribute relations
  delete person(id), name(id, *), surname(id, *), date_of_birth(id, *), nickname(id, *);
  delete member(id, *), joined_on(id, *, *), rank(id, *, *);

  // Removing the person's data from know_each_other and all its
  // attribute relations. Since know_each_other is a symmetric relation,
  // we've to go through both columns in order to delete everything
  delete know_each_other(id, *), met_on(id, *, *), introduced_by(id, *, *);
  delete know_each_other(*, id), met_on(*, id, *), introduced_by(*, id, *);

  // Gangs that are left without a leader
  leaderless_gangs = [g : g <- leader(?, id)];
  for g <- isort(leaderless_gangs):
    // Remaining members of the gang
    members = [p : p <- member(?, g), p != id];
    if members == []:
      // No member left, deleting the gang
      delete gang(g), name(g, *), leader(g, *);
    else
      // Ranks of all remaining members
      ranks = [m -> rank(m, g) : m <- members];
      // Highest-ranking members
      max_rank_members = max_by(members, ranks($, !!));
      // Randomly choosing one of them as new leader
      new_leader = an_elem(max_rank_members);
      // Updating the leader relation
      update leader(g, new_leader);
    ;
  ;
}

Here Miki has made use of a bit of syntactic sugar that has allowed him to avoid an explicit type definition like the type AddPerson = ... used previously. The following definition:

SocialNetwork.delete_person(id: Person) {
  ...
}

is the same as:

type DeletePerson = delete_person(id: Person);

SocialNetwork.DeletePerson {
  ...
}

save for the fact that type of the message is not explicitly named (how exactly this works will become clear after reading the chapter on types).

The message handler starts with a bunch of delete statements. When deleting a person from the dataset Miki has to remove all their data from all the relations that might contain any. But that code is of course unnecessarily complicated and error-prone, and starting with version 0.3 of the language, he will be able to replace the four delete statements above with just the following one:

delete person(id), know_each_other(id | *), member(id, *);

The remaining part of the code deals with the case of people who were leaders of one or more gangs. In that case, Miki has to choose a new leader. The new leader will be the highest-ranking member of the gang, or any of them if there's more than one person with the highest rank, unless the person being deleted is the only member left in the gang, in which case the gang will be deleted as well.

There's one message Miki defines for SocialNetwork that may look a bit strange to someone used to imperative languages:

SocialNetwork.reserve_ids(first_id: Int, count: Int) {
  if self.first_id == next_id:
    set next_id = next_id + self.count;
  else
    fail;
  ;
}

Before adding a new person or gang to his dataset Miki needs to generate new identifiers for them, and that's done by reading and incrementing the value of the next_id member variable. In an imperative language it would be done with a method like the following one:

int get_next_id() {
  return next_id++;
}

but in Cell message handlers cannot return a value, and read-only methods cannot change the state of the automaton instance, so things have to be done differently: first the client has to read the value of the next_id member variable, and then has to send a message to increment it. That would of course be problematic in a concurrent environment, because another thread could read the value of next_id after the first thread has read it, but before it has a chance to send the message that increments it. In order to avoid this problem the message handler has to check that the id being reserved is still available, and fail (that is, "throw an exception") if not.

A few more things about message handlers before we move on. The first one is that message handlers run inside a transaction: if for any reason an unrecoverable error occurs during their execution, the message is simply discarded and the state of the automaton is left untouched. Secondly, message handlers are deterministic: the resulting state of the automaton only depends on the previous state and the message being received, and nothing else. Lastly, a message handler cannot do any I/O, and cannot change the state of other automaton instances: all the side effects are limited in scope to the automaton that receives the message. These properties have a number of consequences, that will be discussed elsewhere in the documentation.

Interfacing with C#

As already mentioned at the beginning of the chapter, Miki is going to write his application in C#. The Cell compiler will generate a number of C# classes, one for each Cell automaton, that can be included in the larger application. Compiling the Cell code will produce two files, generated.cs and interfaces.txt. The former contains all the generated C# code, the second the interface of the generated classes in pseudo-C#. This is the generated interfaces.txt file for Miki's code:

class CellLang.Generated.SocialNetwork {
  SocialNetwork();

  Value ReadState();
  void SetState(String);
  void Execute(String);

  long NextId;

  bool Gang(long);
  long[] Gang();

  bool Person(long);
  long[] Person();

  bool Nickname(long, string);
  Tuple<long, string>[] Nickname();

  bool Member(long, long);
  Tuple<long, long>[] Member();

  bool KnowEachOther(long, long);
  Tuple<long, long>[] KnowEachOther();

  bool Rank(long, long, long);
  Tuple<long, long, long>[] Rank();

  bool MetOn(long, long, string);
  Tuple<long, long, Value>[] MetOn();

  bool JoinedOn(long, long, string);
  Tuple<long, long, Value>[] JoinedOn();

  bool IntroducedBy(long, long, long);
  Tuple<long, long, long>[] IntroducedBy();

  bool Name(string, string);
  string Name(string);
  Tuple<Value, string>[] Name();

  bool Surname(long, string);
  string Surname(long);
  Tuple<long, string>[] Surname();

  bool DateOfBirth(long, string);
  Value DateOfBirth(long);
  Tuple<long, Value>[] DateOfBirth();

  bool Leader(long, long);
  long Leader(long);
  Tuple<long, long>[] Leader();

  string FullName(long);
  long[] PeopleBornOn(string);
}

That's a lot to parse, so let's break it up into more digestible pieces, starting with the pair of complementary methods ReadState() and SetState(). The former takes a snapshot of the state of an automaton, and returns it as an object of type CellLang.Value, which is a class that can represent any Cell value. The returned states can be freely inspected and, for example, printed or stored to a file. SetState(..) erases the current state of the automaton and sets its new state to whatever value is provided as argument (as long as that's a valid state for the automaton in question, of course). The argument is provided as a string containing the standard textual Cell representation of the new state. Any state returned by ReadState() can be used, once it has been converted to its standard textual representation, as input for SetState(..), which makes it very easy to save the data stored inside an automaton to a file, and then read it back later.

The next method, Execute(..), is used to send a message to the automaton. Here too the message is provided as a string containing its standard textual representation. If an error occurs during the execution of the message handle an exception is thrown, but the state of the automaton does not change, as we've already mentioned before.

Here's an example of how to use the three methods just discussed. The following C# code create an instance of SocialNetwork, loads an initial state for it from a file, sends a message to the automaton to delete one specific person from the dataset and saves the resulting state/dataset to another file:

// Creating an instance of SocialNetwork
Generated.SocialNetwork socialNetwork = new Generated.SocialNetwork();

// Reading and setting the initial state for the automaton
socialNetwork.SetState(File.ReadAllText("initial-state.txt"));

// Sending a message
socialNetwork.Execute("delete_person(id: person(5))");

// Writing the new state to a file
using (StreamWriter outputWriter = new StreamWriter("final-state.txt")) {
  socialNetwork.ReadState().Print(outputWriter);
}

Here's an example of what a valid state for the SocialNetwork automaton looks like, once it's converted into its standard textual representation.

All the remaining methods are specific to each type of automaton. There are methods (or properties) to read the value of member variables, like long NextId; to invoke read-only methods of the automaton, like string FullName(long) and long PeopleBornOn(string); to read the entire content of a relation, like long[] Person(), Tuple<long, string>[] Nickname() and Tuple<long, long, long>[] Rank(); to check if a relation contains a given tuple (or element, for unary relations) like bool Person(long), bool Member(long, long) and bool MetOn(long, long, string); and finally, in the case of binary relations, to lookup the value of an attribute of a specific entity given its id, like string Surname(long) or long Leader(long). All these methods are described in detail elsewhere in the documentation, and we won't discuss them any further here. The only thing left to mention is the fact that the Cell compiler tries to map each Cell type to a corresponding native C# type: when it cannot find a good mapping, the corresponding piece of data is either returned as a CellLang.Value object, when data goes from Cell to C#, or passed in as a string containing the textual representation of a Cell value when data moves in the other direction. The details of the mapping too are described in another chapter.

Where to go from here

Miki is rewriting the data structures and logic of the Cell part of his application in C#, in order to see how the two approaches/paradigms compare. You can read about it here.

The relational automata and state updates chapters contain the complete documentation for relational automata, but you'll have to read about data and types first. The overview and quick overview will give you some additional information.

This chapter focused on the most important of the two types of automata, namely relational automata. For an overview of the other type, reactive automata, just head to the corresponding chapter. It should be readable even without any previous knowledge of the language.