A quick overview of Cell

This mini-overview is intended as a more accessible alternative to the full-length one, which many people found somewhat confusing. It focuses more on the rational for developing the language, rather than the details of it, and it may be worth reading even if the main overview is clear to you. It's incomplete and in draft form, providing many bits of information in random order, and makes no attempt to arrange them into a more coherent whole. It's also in a state of flux: I'll keep expanding and amending it for the foreseeable future.

The reason Cell exists at all is to combine functional programming and the relational model, with the former providing the computational capabilities and the latter the ability to easily encode, navigate and update complex information/state, with many different types of entities and relationships between them.

It also borrows other ideas from the database world: chief among them is the practice of removing all forms of redundancy from the data structures that are used to encode the state of an application. There are two main reasons for this: first of all, a data structure that contains little or no redundancy is much less likely to be left in an inconsistent state by a buggy update. Secondly, the code that updates the state of the application becomes a lot simpler, simply because there are fewer things to update.

Redundancy includes not only information that is duplicated, but also information that can be derived from other information that is already present in the dataset. Eliminating derived information is more difficult than doing the same with duplicate one of course, because there are in many cases performance issues. In order to address those, Cell is meant to include in the future several forms of memoization, but that's a long-term effort.

A paper that came out more than a decade ago, "Out of the tar pit" (just google it), explains in more detail the notion of "essential state" (as opposed to "derived state") and the idea of combining functional programming and the relational model. This project is not associated in any way to the authors of that paper, and the particular flavor of the relational model implemented in Cell is very different from the one advocated in it, but until the documentation on this website is complete, that's probably the best reference to those ideas I can point you too.

I believe the relational model is by far the best data model I know of, and that it's far superior to the records+pointers data model used by most imperative programming languages, and by object-oriented languages in particular. I can't explain why in just a few lines, though. It's a part of the website/documentation I'm still working on, but which I hope to publish relatively soon. Note however that the relational model is not the same thing as SQL: in SQL databases the relational model is saddled with a number of limitations that would not be acceptable for a programming language (like the fact that the types of the columns in a relation/table can only be chosen from a few predefined datatypes) and SQL itself is of course an abomination of a query language. But that's not the only way to do it, just like C++ is not the only way to do OOP.

Cell's own compiler is written in Cell, but that doesn't mean Cell was intended to write compilers. Generally speaking, Cell was not designed to do the things that existing functional languages already do well. Quite the opposite, actually: it's designed to build stateful systems.

The target application for Cell is actually a particular type of stateful systems. Let's agree to call them "reactive systems", or maybe "event-driven systems", in what follows. These systems are usually at rest (meaning that there's no code being executed, no call stack, and no instruction pointer) and when they are the only property they have is their state. They react to external events by changing their state, which is conceptually an atomic, all-or-nothing operation. Their new state after an event depends only on the state before the event and the event itself. They do not do any sort of I/O, and are totally oblivious to the world around them.

It's a very general concept, that can be applied to many types of applications. An example is The Elm Architecture. In order to create a web application in Elm, you've to define the type of the state of your application; an initial state for it; the types of all messages it can receive; and a transition function that given the current state of the application and a message returns the new state of the application. There's actually more than this, but let's ignore that for now.

Relational automata work in a similar way. The main difference is that you can make use of the relational model to define the type of the application state (as if you were defining a small database schema) and that the transition function instead of returning the new state of the application returns a set of atomic update operations (set/insert/delete/update) that are applied to the current state to produce the new one. This is conceptually similar to what you do to update the data contained in a database: you just send a set of atomic insert/delete/update commands that are executed by the DBMS. I believe this approach will turn out to be both more convenient and more efficient when dealing with complex application states.

Reactive systems are important because they have a number of nice properties we can take advantage of when implementing them. Since they are isolated from their environment, they can be made to behave deterministically, which is in some sense the stateful equivalent of referential transparency: just like the result calculated by a pure function depends only on the values of its arguments, the evolution of a reactive system (that is, the sequence of states it goes through) only depends on its initial state and the sequence of events that cause it to undergo a state transition. This property not only allows us to replay its execution, but it also allows us to understand its behavior in isolation, without having to account for the environment it is embedded in. Again, this is similar to what happens with pure functions.

Combining determinism with the fact that state transitions are well-defined and discrete provides additional benefits. If something goes wrong during a state transition, we have the option of just aborting it and going back to the previous state, instead of leaving the system in a messy and inconsistent one. Just ignoring the event that caused the state transition does not always make sense with regard to the application logic (and that's why, for example, reactive automata follow a different error handling model), but when it does, it's an almost ideal way to handle errors. It also becomes possible to take a snapshot of the state of the system when it's at rest, and use it later to transparently recreate an identical copy of it that will behave exactly like the original instance.

Contrast this with what happens with a process, which is the abstraction used by imperative programming languages. Since a process interacts with its environment by doing I/O, its behavior can not be made deterministic in any meaningful sense of the word, since it depends on the state of the external world. For exactly the same reason, it's not in general possible to "replay" the execution of the process, because its in general not possible to exactly replicate the environment it was running in. You cannot persist a process either, not only because its state is tightly coupled with its environment's, which cannot be persisted, but also because in order to do that you would have to store things like the call stack and the instruction pointer. But then even minimal changes to the source code, like a simple bug fix, would in all likelihood prevent the new version of the application from being able to make use of the snapshot of the state of the previous ones, since call stacks and code locations would not match. Defining and implementing transactions would also be very problematic: while the idea of defining "checkpoints" during the execution of a process to go back to should an unrecoverable error occur is at least conceivable, the fact remain that you cannot in general undo I/O operations, so a rollback would in many cases cause the state of the process and its environment to go out of sync.

So while the primary goal of Cell is to combine functional programming and the relational model, one of its secondary goals it to provide a language for implementing reactive systems (as defined above) that can automatically take advantage of their "nice" properties, which more general (or simply different) types of systems do not have.

Note that, somewhat confusingly, both relational automata and reactive ones are used to implement reactive systems: that is, reactive automata and reactive systems are not the same things: a reactive automaton defines a reactive system, but so does a relational one. They are just different ways to model reactive systems, each of which has its own use cases. Between the two, relational automata are by far the most general ones, while reactive automata are more of a niche feature. In the future, the language may well include other types of "niche" automata, specially designed for classes of reactive systems that don't lend themselves to being modeled naturaly with either relational or reactive automata.