With the functional subset of the language now out of the way, we can finally examine those constructs and features that are unique to Cell, and we'll start with the most important one, relational automata. Before we delve into all the gory details, make sure you've read both the introductory example and the overview: the former will give you a much gentler introduction and the latter will explain the basic principles that govern the workings of relational automata.
The first thing you need to provide in order to define a relational automaton is the type of its state. An automaton instance is a sort of glorified variable, which has, at any given point in time, a value which we'll usually refer to as its state. The state of an automaton is always a record, and its type is defined with a schema definition. Here's an example, similar to something we've seen in a previous chapter:
The above schema definition looks a lot like a type definition, and indeed a schema can be used, with some limitations, as a normal type in function signatures. When
Counter is used that way, it works exactly as if it had been defined like this:
Schemas are essentially types, but they are augmented by extra bits of information that apply only to automata. One such piece of information is on display in the schema definition above: a default value for the initial state of the automaton. Whenever an instance of
Counter is created its intial state will be
(value: 0, updates: 0), unless a different value is provided explicitly.
While an instance of an automaton like
Counter can be though of as a variable of a record type, whose value can be read just as if it were a normal variable, in some ways such an instance behaves more like a mutable data structure (like a
struct in C or a class in an object-oriented language) with two member variables
updates that can be modified directly inside a message handler. In the functional subset of the language we examined in the previous chapters there's a strict distinction between variables and values: variables can be updated, that is, the value they contain can change over time, but values themselves are immutable: if you have, say, a variable
v that contains a record value with two fields
y you cannot update those fields directly like you do in imperative languages: all you can do is create a mutated copy of the record in
v and store that new record inside
v. If the language allowed imperative updates of the form
v.x = new_x; and another variable
v' somewhere else in the program pointed to the same physical data structures as
v would have the side effect of updating
v' as well. That's what happens all the time in imperative languages of course, but preventing this kind of unrestriced side effects is quite possibly the most fundamental feature of functional programming. In the case of automata, though, the same end is achieved by different means: either by preventing (or restricting) aliasing in the first place (that's what happens with mutable relation variables, discussed below) or by making a physical copy of some of the data structures involved, which is what usually happens under the hood when you read the whole value of an automaton variable, as opposed to the value of one of its individual fields.
Mutable relation variables
As already mentioned in the overview, the most important feature of relational automata, and of the entire language in general, is the ability to use (mutable) relation variables to encode their state. Here we need a slightly more complex example: let's say you want to build an old-fashioned chat server, that will allow users to connect from a remote computer using a command line tool like
telnet. As soon as they connect they will be automatically assigned a numeric id by the server (which will be invisible to them) and they will be able to choose a (unique) username using a command of the form
login somecutename. Once they're logged in they will be able to send and receive messages to and from other individual users, and will also be able to join any number of chat groups with the command
join chatgroupname. Groups are managed dynamically: a group will be automatically created, if it doesn't exist already, as soon as a user tries to join it, and destroyed when the last member leaves. If you wanted to create an automaton that keeps track of usernames and the list of chat groups each user has joined, you could for instance start with something like this:
usernames field/variable is a map that associates each numeric user id with its corresponding username (which is a string, of course), and
memberships is a binary relation that keeps track of which chat groups each user has joined, with users identified by their numeric id and groups by their name. Here's a sample value for the state of (an instance of) the
Alternatively, the normal relation variables
memberships could be turned into mutable relation variables, as shown here:
What's the difference between
Logins2? With regards to the set of possible states they can assume, they're identical: every valid state for
Logins1 is also a valid state for
Logins2, and vice versa. The difference lies in the operations you're allowed to perform on the relation variables
Logins1 you can use them as ordinary variables, whose value can be read and passed around, and you can assign a new value to them inside a message handler, subject to a number of restriction we'll talk about later. But you cannot imperatively update the values they point to, as explained earlier. With
Logins2, on the other hand, those relations can be efficiently updated in place, although of course only inside a message handler and with all the restrictions that apply. You can insert new tuples and update or delete existing ones, just like you would do with SQL in a relational database. The price to pay for such a priviledge is that mutable relation variables cannot be aliased. You cannot copy them, pass them to other functions, return them from methods, or use them to build larger composite values. The following expressions or statements, for example, are all rejected by the compiler in the case of
Logins2, but are perfectly valid if applied to
All you can do with mutable relation variables, apart from inserting, updating and deleting tuples, is lookups, searches, linear scans and retrieving the size of the entire relation or a projection of it:
The result of all these expressions is an ordinary value, which cannot be updated in place but can otherwise be manipulated without restrictions. In particular, if you need to copy the content of a mutable relation variable elsewhere you need to make a physical copy of it, as show in the last of the above expressions.
You can also iterate through a mutable relation variable or a projection of it using a for loop:
Just like normal relations, mutable relation variables can be either unary, binary or ternary. Each type has its own set of use cases: unary ones are typically used to state the existence of a given entity, or to encode boolean attributes; binary ones to store entity attributes (both mandatory and optional, and single-valued or multi-valued) or relationships; and ternary ones for attributes of relationships (as opposed to of entities). We'll see plenty of example later.
[key: 0] annotation to the declaration of
Logins2 simply declares that the first column is a key for the relation, that is, that no two tuples in
usernames can have the same value for the left argument. This is the same as saying that the
usernames relation is actually a map, just like it is in
Login1. But in this case we can do better: not only numeric identifiers but also usernames should be unique. This can be enforced by declaring a second key for
usernames as shown here:
usernames is a bidirectional map: every numeric id is associated to a single username and vice versa. The ability to declare multiple keys is only available for mutable relation variables and not for ordinary types, at least for now. Ternary relations can also have composite keys: we'll see examples of that later.
The second main type of integrity constraints that is common to pretty much all flavors of the relational model is foreign keys. Here's an example:
The last two lines in the definition of
BooksAndAuthors are foreign key declarations. The first one states that an entry
b, a can appear in
written_by if and only if there's an entry in
books whose left value is
b and another entry in
authors whose left value is
a. Using foreign keys to guarantee that all references to a certain entity or relationship are valid is something that we're all already familiar with, since it's standard practice in the design of database schemas.
In Cell though foreign keys are also used for a second purpose: to make attributes mandatory. The second foreign key declaration in
BooksAndAuthors is an example of that: it states that in order to insert a book in our dataset we must also insert at least one author for it. In a sense this second foreign key is the inverse of (part of) the first one: for every entry of the form
b, _ in
written_by we must have a corresponding entry of the form
b, _ in
books and vice versa. Together the two foreign keys basically form a single "bidirectional" key between the first column of
books and the first column of
This second use of foreign keys is not something that is common in SQL databases, because there the identifier of an entity or relationship and all its single-valued attributes are stored in the same table, and therefore you're forced, when you insert a new entity or relationship, to also insert all its non-nullable attributes, or your insertion command will fail.
In Cell, on the other hand, each attribute is stored in a separate relation, and without all the appropriate foreign keys in place, you could insert a new entity or relationship without all its mandatory attributes. Therefore a typical Cell schema will contain many more foreign keys declarations than a typical SQL schema. Fortunately, the vast majority of them will be declared implicitly when using the syntactic sugar that we'll examine in the next chapter.
Here's a complete list of all types of foreign keys that are currently supported in Cell:
If two foreign keys originate from the same relation you can combine their declarations. This declaration, for example:
is equivalent to:
Some combinations of foreign keys are checked at runtime, while others can be checked at compile time. We'll say more about that in another chapter.
Polymorphic mutable relation variables
In the same automaton you can declare multiple mutable relation variables with the same name. Here's an example:
The mutable relation variable
UsersAndGroups is polymorphic, as name is an attribute of both users and groups. The first instance of
name stores the names of the users, and the second one the names of the groups. Polymorphic relation variables are particularly useful in the presence of inheritance hierarchies: we'll see plenty of examples later.
Polymorphic mutable relation variables are subject to a number of constraints: for starters, they all must have the same arity, and their signatures have to be disjoint. They also must have the same set of keys, and for each key the types of the columns that form it must also be disjoint.
name for example has a key on the first column, and that's fine because the types of such column (
GroupId) are disjoint. But if you placed a key on the second column of both, like so:
the compiler would reject the code, because the type of the second column,
String, is the same in both cases. That's because the meaning of such a key would be ambiguous: should the values in the second column unique across the entire relation, or only within each instance? In other words, should a user and a group be allowed to have the same name? In this case the answer is obviously yes, but in other circumstances (for example when dealing with inheritance hierarchies) but we would probably want the opposite. The compiler avoids the ambiguity by rejecting such a key entirely, at least for now.
In order to keep the implementation of relational automata reasonably simple and efficient, the compiler also places some restrictions on the combined use of polymorphic relations and foreign keys. We won't explain the details here, both because they're extremely convoluted and because they're still a work in progress. This problem should arise only in very unusual circumstances, but if you come across one of such cases, the compiler will tell you exactly which foreign keys you should forgo, even though it won't be able to explain exactly why.
Sometimes entities of a given type are linked together by relationships that are intrinsically symmetric. An obvious example is online friendship between user of chat apps or some social networks. Encoding a symmetric relationship with a non-symmetric relation creates a number of problems. Let's extend the
UsersAndGroups schema above so that we can keep track of which users are online friends and since when:
We've now several different ways to encode the same set of facts. These two values for
are_friends are conceptually equivalent:
but the fact that they can be represented differently creates a number of problems. If you want to check if two given users are friends, you'll have to check both possibilities (
u1, u2 and
u2, u1), because that specific fact could be encoded in either way. The same would happen when retrieving all friends of a given user, or when deleting a friendship from the dataset. The problem is even worse for attributes of a symmetric relationship. This, for example, is a legal value for
but it's clearly inconsistent, because you've two different start dates for the same friendships. In order to avoid these problems, Cell provides symmetric relations. Here's how you can use them in
The column that separates the two arguments of
are_friends marks it as a symmetric relation. Now it doesn't matter what order the two arguments are inserted in: the compiler takes care of checking both possibilities whenever you search the relation, or when you delete data from it. The same happens with the first two arguments of
friends_since. Even the foreign keys are simpler now. These two foreign keys are now equivalent, so you need to declare only one of them:
The compiler expects the arguments of a symmetric binary relation to be of the same type. The same goes for the first two arguments of a partially symmetric ternary relation.
A naive implementation of this data model is, of course, not going to perform particularly well. Luckily, that's not the case for a more optimized one. Mutable binary relations can actually be implemented very efficiently, especially if they have a key, in terms of both speed and memory footprint. There are trade-offs to be made between access speed, update speed and memory consumption for mutable ternary relations, and their implementation is more complex, but they too can be made to perform just as well as informationally equivalent, hand-optimized data structures. And an efficient implementation of mutable unary relations/sets is of course trivial. This is made possible by (among other things) the strict separation that we have in Cell between the logical definition of a data structure and its physical, under-the-hood implementation. We won't go into detail yet, since the implementation of those low-level data structures is not complete yet (it will be completed with the next release of the compiler, 0.4), except to say that the final data structures will essentially be a more general version of those used in the Entity/Component architecture adopted by many videogames.