Message handlers and updates
The normal way to mutate the state of an automaton is to send it a message. For each type of message there's a corresponding message handler which is just a block of imperative code that takes the message itself as argument and carries out the actual update. Here's an example, using the Counter
automaton that we defined earlier:
The CounterMsg
type defines the set of messages Counter
can accept. The declaration is commented out because it's actually unnecessary, it's basically just documentation. After that are four message handlers: the first three handle the values :incr
, :decr
and :reset
respectively, and the last one all integers tagged with the :reset
symbol. They all make use of some syntactic sugar: the general, unsweetened form of a message handler is as follows:
so without any syntactic sugar, the first of those message handlers would have to be written like this:
or more simply like this:
but in the latter case you're allowed to ommit the <
and >
symbols, if the type between them is not a union type. In order to access the value of the message you need to use the keyword this
, as demonstrated in the last handler, Counter.reset(Int)
. Note that the same keyword is also used in several object-oriented languages, but the meaning is different: in OOP it's used to refer to the target of a method (that is, the object that "receives the message") while here it's used to refer to the message itself. There's no way in this context to refer to (the value of) the automaton instance that receives the message (only its individual fields can be accessed), because that value cannot be aliased as we explained in the previous chapters.
Any number of message handlers can be merged without restrictions, whenever that's convenient. The four message handlers of Counter
can for example be rewritten as follow:
or alternatively like this:
All three implementations are completely equivalent.
In spite of a few superficial similarities, message handlers differ from methods in OOP (or even messages in the actor model) in several significant ways:
- Similarily to messages in the actor model, but unlike methods in OOP, messages in Cell are just ordinary values, that can be manipulated like any other piece of data: they can be copied, passed around, sent over a network and saved to a file. It's very easy, for instance, to store and keep a record of all messages that have been received by any given automaton instance.
- Another similarity with the actor model is that message handlers don't return a value: there's a strict separation between commands/messages and queries/methods. Such a programming model is more reliable when writing distributed applications.
- Automata can only receive messages, not send them, even to themselves. There's no such thing as "message passing" between automata. That's by design and it's never going to change: automata are meant to be self-contained entities, oblivious to the world around them, the only exception being other automata they're wired to. Any inter-automata update logic there is, it has to go somewhere else. For the time being, that typically means it has to be dealt with in the host language, not in Cell. There's only one exception to this, and it's an important one: a message handler can send a message to the dependees of the current automaton.
- The scope of the side effects caused by a message handler is limited to the automaton instance that received the message and its dependees. The state of other automata is not affected, and that includes the dependants of the target of the message.
- All update instructions that are issued as part of the execution of a message handler do not take effect immediately. Instead, they are retained and applied only when the message handler returns, which means that during the entire execution of the latter the state of the automaton doesn't change at all. Not only that, but also the specific order in which those commands are issued is ignored. This is a major difference, that completely changes the way you have to write and think about update code. It comes with a long list of trade-offs, which we'll discuss in detail later.
Atomic update instructions
We've already seen our first atomic update instruction, the set
statement, which simply sets the value of a member variable. It can be used only with ordinary variables, not with mutable relation variables. The most important thing to notice here is that any given member variable can be assigned only once during the execution of the message handler: trying to set the same variable twice would cause the message handler to fail. That's because, as explained above, the order in which assignments are issued is ignored, and therefore there's no way to decide which of two (conflicting) assignments targeting the same member variable should actually be executed.
In order to update mutable relation variables the language provides two commands, insert
and delete
. To illustrate them, we'll make use of some of the automata we defined in the previous chapter. Here's our first example:
For both updates and deletes you can merge several commands into a single one: insert relvar1(x1, y1);
and insert relvar2(x2, y2);
can be combined, with no difference in meaning, into insert relvar1(x1, y1), relvar2(x2, y2);
. When deleting entries you can also use wildcard arguments, as demonstrated in Logins.logout(..)
and Logins.clear
. The above examples all use binary relation variables, but it all works in the same way for unary and ternary ones, just with one or three arguments respectively instead of two.
The insert
command comes in two flavors. The standard version is the one shown in the above example: it fails when inserting an entry that would violate the integrity constraints that apply to the target relation variable. The second form is a sort of "forced insert": if the relation contains entries that are key-incompatible with the entry that is being inserted the former are simply deleted. The only syntactic difference is that it requires the update
keyword instead of insert
. A command of the form:
is (in the specific case of usernames
, which has a key on both columns) equivalent to:
For relations that have no keys, like memberships
, the two forms are completely equivalent. The update
form is convenient when updating the values of the attributes of an existing entity or relationship (hence its name): this is, for instance, how you would update the value of the address
attribute for a specific supplier in the Supply
automaton defined in the previous chapter:
or the price charged by a specific supplier for a specific part:
Just like it happens with the set
instruction, the order in which inserts, updates and deletes are issued is ignored, so you have to take care not to issue conflicting commands. These update handlers, for instance, will fail:
Inserting the exact same entry twice will always succeed though, because duplicates are ignored:
Once the message handler returns and all inserts and deletes take effect, the order in which these commands are issued is irrelevant but, conceptually at least, all deletes are applied before the inserts. Updates commands are just treated as a combination of deletes and inserts.
Inserting attributes
When inserting an entity or a relationship from a dataset you'll generally have to also insert attributes stored in ancillary relations. That tends to be rather very verbose, so the language provides some syntactic sugar here. If you have a set of mutable relation variables like the following ones:
Here's how you can write a message handlers that inserts a new user and all its attributes:
It's completely equivalent to the following, more verbose version:
Here's another, more complex example:
which is equivalent to:
Whenever you insert a new entity or relationship in the dataset, the compiler checks that you're also inserting all its mandatory attributes. More specifically, whenever you define, either implicitly or explicitly, a foreign key like this:
the compilers expects you to follow up any insertion into unary_rel
with a corresponding one in binary_rel
, and for a foreign key of the type:
it expects any insertion into binary_rel
to be followed by at least one insertion into ternary_rel
. In both cases it makes sure that you're inserting exactly one value for (mandatory) single-valued attributes, and at least one for (mandatory) multi-valued ones.
Cascade deletes
Just like for insertions, there's some syntactic sugar for deletions as well. Using the Users
schema defined above, the following message handler:
not only deletes that particular id from user
, but also deletes all its corresponding entries in the attribute relations. In other words, the compiler rewrites the body of delete_user()
like so:
Note, though, that only the attributes that are defined using the syntactic sugar are deleted automatically. If Users
had been defined as follows, without any syntactic sugar:
you would have had to explicitly delete all the data from the attribute relations as well.
Mutator methods
Message handlers have no way to invoke one another (or to put it another way, automata cannot send messages to themselves), so when you need to share code between them or split their implementation in smaller, more manageable chunks, you need to make use of another construct, mutator methods. The following is yet another way to implement the message handlers of Counter
:
Now each message handler just calculates the new value for value
, which is then passed on to the mutator method update_value(..)
which issues the commands needed to update both value
and updates
. Mutator methods must be declared inside a using block, have no return value, can take any number of arguments and can be polymorphich just like functions and ordinary read-only methods. They can be invoked only from message handlers and other mutator methods of the same automaton.
In order to better understand the respective roles of message handlers and mutator methods and the differences between them it's useful to review the life cycle of an update. Once an automaton receives a message, the corresponding handler is invoked. A message handler is the entry point for any update to any relational automaton: an update starts when a handler is invoked, and finishes when the handler returns, after which the state of the automaton is finally updated. Until the handler returns no other handler can be invoked, and therefore no other message can be received: the execution of two message handlers of the same automaton cannot be interleaved or nested, message handlers cannot invoke each other in any way, and automata cannot send messages to themselves. Mutators on the other hand can only be run as part of the execution of a message handler, and cannot be invoked in any other context. The update commands they issue are only applied when their enclosing handler returns, and if they fail they cause the entire handler to fail. Message handlers can only be invoked from the outside, and they constitute the public interface of relational automata with regards to updates, while mutators are only meant to provide a way to implement message handlers in a modular way.
Wired automata
As already explained, automata are passive entities that can only receive but not send messages, with one exception: the body of a message handler can send a message to the dependees of the current automaton. Here's an example, extending something we saw in a previous chapter:
Just as before, Logins
contains information about users who have logged in and the chat groups they've joined, while ChatServer
keeps a list of the users who have connected to the server (a user here can be connected without being logged in: the steps of the process are connect -> login -> logout -> disconnect). Whenever a user disconnects from the server, they must be automatically logged out, and removed from all chat groups they're in. That's done by the super <- remove_user(id: this.id)
statement inside the body of ChatServer.disconnect(..)
. The general form of the statement is:
where message
can of course be an arbitrary expression. The execution of the message handler of the dependee is regarded as part of that of the dependant: the two succeed or fail together, and all updates are applied at the same time, so the effects of sending a message are not visible during the execution of the message handler of the dependant, and the order it's issued relative to other update instructions is irrelevant.
If there's more than one dependee, the compiler uses the type of the message to decide which one it should be sent to. If there's any ambiguity (that is, if a message of the given type is accepted by more than one dependee) the compiler will reject your code, as there's no way yet to explicitly specify the intended target.
A message handler can send only one message to each dependee. Sending more than one to the same target will cause a runtime error.
Execution model and error handling
The update process triggered by the reception of a message can be divided into several distinct phases. The first step is to dispatch the message to the corresponding message handler, which is similar to the dispatch of a polymorphic function. The next phase is the actual execution of the message handler, during which the state of the automaton does not change, and all the update instructions that are issued are just stored in a separate data structure and left there. If any error occurs during the handler's execution, the message and the set of update commands issued up to that point are just discarded, and the sender of the message is notified of the error. Once the handler returns, all update commands are first checked to make sure they don't violate any integrity constraints. If they do, the message is again discarded and the state of the automaton is left untouched, just as if an error had occurred during the execution of the handler. Otherwise the state of the automaton is updated and the sender is notified that the message was processed successfully.
Deferred updates: issues and rational
Message handlers and mutator methods look a lot like ordinary imperative code, and it may be easy to be misled at first by their syntax, but their semantics is in fact completely different. The execution of the update instructions they issue is deferred until the message handler returns, and the order in which they're issued is ignored. A problematic consequence of this execution model is that different parts of an update cannot be easily composed. That happens for two reasons: the most obvious one is that two different updates can both try to set the same member variable (or more generally issue incompatible update instructions) which would cause the whole update to fail. The other, only slightly less obvious but probably more dangerous problem is that they are not aware of one another's changes. Let's illustrate this second issue with an example: say you're building a role-playing game, where the player gradually gains experience and levels up once such experience reaches some predetermined thresholds. The level of the player in turn determines a number of other characteristics, like their maximum health. Let's also say that every time the player levels up, you also want to top up their health. Doing something like this would work fine in an imperative language:
but it doesn't in Cell. The problem lies in the way level_up()
and top_up_health()
interact, or rather fail to interact. The level
member variable is updated in level_up()
but since that change is not put into effect until later top_up_health()
erroneously calculates the maximum health based on the old value of level
, with the result that the player's health bar ends up not being properly refilled.
The choice of such an execution model is probably the most puzzling design decision in the entire language since it makes update code more difficult to write, it adversely affects its modularity, and was not, in any way, dictated by other features of the language. But in spite of its issues, it also has a couple of important advantages.
One major problem with imperative code is that it doesn't lend itself to parallel execution: it's notoriously difficult to write concurrent imperative programs, and even more so for a compiler to (semi-)automatically and safely parallelize the execution of existing code. One much hyped advantage of pure functional programming on the other hand is that it is very well suited to parallel execution: since functions are referentially transparent and have no side effects, two function calls can always be executed in parallel, as long as neither depends on the other's result. And in spite of their imperative syntax, message handlers in Cell are, in fact, pure functions: the only thing that's different about them (apart from the syntax) is the fact that instead of explicitly returning an ordinary value they implicitly return a set of set
/insert
/update
/delete
commands that are then checked and executed by some compiler-generated code. Both the execution of the message handler and the subsequent checking and execution of the resulting commands can potentially be made to take advantage of multi-core processors safely and with little effort on the programmer's part. That's in addition to the potential inter-automata parallelism: since automata cannot share any mutable state (thanks to the combination of immutability for some data structures and the impossibility of aliasing for others) they can potentially be safely updated concurrently, unless they are wired together as dependant and dependee. Now, just to be clear, what we're talking about here is just leaving the door open for a future implementation to take advantage of parallelism: currently there's no support for concurrency at all, and a fully parallel implementation is going to be a long-term effort.
Another problem is that the execution model of imperative languages, while undoubtedly convenient in simple cases like the example shown above, leads to nasty spaghetti code when programming in the large. Even in a moderately complex real world application, an update may well involve mutating the data structures that encode the state of the application in hundreds of different places. The goal of an update is to transition those data structures from the state they have before the update to the one they're supposed to have when the update is complete. But in order to do so they have to go through many different intermediate states each of which is the result of applying an atomic update to the previous one, and it's also common for specific pieces of information to be mutated several times during a single global update. The vast majority of those partially updated states are inconsistent, since they're by definition the result of updating only a part of the target data structures. But all those intermediate states are visible to the application code, with the result that's very difficult to know, when reading the value of a particular variable, whether such a value is the initial one, the final one, or just an intermediate one. The correctness of the code is highly dependendent on the order in which side effects are carried out, but such order is very difficult to control when updating a large object graph.
Contrast that with what happens when a message handler is executed in Cell. The only observable states of the target automaton are the one before the message is received and the one after the handler has returned. There's no such thing as a temporary, partially updated state: all states are expected to be consistent. There are also no ambiguities either when reading the state of the automaton inside a message handler: it's always the initial state. Of course, that's not always what you want. Sometimes what you actually need is the final value of a variable after the update, as with level
in top_up_health
. Other times you need to update a certain piece of information more than once, and you actually need to be able to see those intermediate states. In cases like these the only option available in Cell at the moment is to make a copy of the data structures involved, which you can (functionally) update as many times as you need, and to explicitly pass them around.
I'm happy to admit that this execution model is unproven (it's not used by any other language I know of), and that I fully expect it to turn out to be too restrictive in at least some circumstances. But one thing to keep in mind is that the current model is just a starting point. There are a lot of things that can be done to make it more flexible, without sliding into the complete caos that is one of the most distinctive features of imperative languages. As an example, it would be relatively easy to add the ability to explicitly commit, during the execution of a message handler, all update instructions that have been issued up until a certain point before executing the remaining part of the handler. The overall goal is to try to deviate as little as possible from standard functional programming: to allow observable destructive updates, in order to improve both flexibility and performance, but at the same time to restrict such operations in various ways in order to minimize the unwanted consequences of mutability.