Protocols
Variables or expressions of a generic type (i.e. one that is or contains a type variable) have to observe a number of restrictions in their usage, since they could end up taking any sort of value at runtime. Among other things, they cannot be used as arguments in a function call, unless that function is itself generic. Generic functions that cannot be implemented under those constraints need to use another language construct, protocols, which is more or less the equivalent of interfaces in object-oriented languages and type classes in functional ones. As an example, here's the definition of the two-argument min(..)
function and the Ord
protocol in the standard library:
Both arguments of min(..)
are of a generic type O
, but because of the typevar O : Ord;
declaration, a restriction is placed on whatever type is bound to it at the call site: it has to implement all methods defined in Ord
protocol. If, for example, min(..)
were called with arguments of type String
, there would have to be, somewhere in your code, a definition of the operator <
that accepts arguments of type String
and whose return value belongs to Bool
. The scope of a typevar declaration is just the file it's declared in, and it doesn't affect functions that use the same type variable if they are defined in different files. Some syntactic sugar is provided for simple cases like this one: you can skip the typevar declaration altogether and use the Ord
protocol directly in the signature of min(..)
as if it were a normal type, and that's actually the recommended style:
More complex cases still require the use of a typevar declaration, though. Here's an example, a function whose signature contains two distinct type variables, each of which has to implement the Ord
protocol:
Using Ord
directly in the signature of sort_pairs(..)
would have been syntactic sugar for the following definition instead, which is clearly not what was intended:
Protocols can of course have any number of methods. The (largely useless) Elem
protocol below for example is designed for types that can fit in any type of container, both ordered and unordered ones, while also allowing a custom notion of equality:
In keeping with Cell's low-ceremony approach, conformance to a specific protocol is purely structural, with no explicit instance declarations. In order to have a type MyType
conform to Elem
all you need to do is implement the corresponding functions:
The same happens with the subtyping relation between protocols. Elem
for example is a subtype of Ord
, since the operations required by Ord
(only <
in this case) are a subset of those required by Elem
, so a value of type Elem
can be used anywhere a value of type Ord
is required.
Protocols can define more than one type. The protocol in the following example defines two abstract types, StateM
and Msg
, which are then used in run_all(..)
and implemented by the concrete types Counter
and CounterOp
respectively:
Implicit arguments
It's rather common for functions to need arguments that they don't use directly, but that are just passed on to other functions. These other functions may in turn need these parameters only because they too have to pass them on to other functions and so on. So sometimes a piece of information that is required only in a very specific function deep down in the call stack has to be passed around by all functions that depend on it, either directly or indirectly. This can be especially annoying when the need for these extra parameters arises when the code is being modified (as opposed to being written in the first place) since that may involve modifying the signatures of, and calls to, a lot of functions all over the code base. It also tends to affect functional languages more severely than imperative ones, since the former lack the sort of escape hatches the latter can offer, like for example global variables. Cell has a feature specially designed to ease this problem: implicit arguments. Let's start with a toy example:
format_date(..)
and format_person_data(..)
are defined inside an implicit block. All functions defined inside such a block acquire the implicit arguments that are declared after the implicit
keyword. In this case there's just one argument, locale
, of type Locale
, but an implicit block can have any number of arguments, separated by commas:
Different implicit blocks can have the same arguments, and it makes no difference whether two functions are defined in the same block or not, as long as the arguments are the same. format_date(..)
and format_person_data(..)
could have been defined as follow, and it would have made no difference at all:
When calling a function with implicit arguments you can always pass all of them explicitly by name, after the positional ones, as in format_person_data(p, locale=:en_us)
above. Their specific order doesn't matter. But if caller and callee share a specific implicit argument (or any number of them) and their types are compatible (that is, the type in the caller is a subtype of the type in the callee) you can just ommit the argument which will be passed on automatically without any changes. That's what happens with format_date(p.date_of_birth)
in format_person_data(..))
.
This is somewhat reminiscent of what happens in object-oriented languages with the this
/self
parameter, which is implicitly passed around among methods of the same class unless otherwise specified. The main differences (apart from the syntax) are the facts that in Cell you can have any number of implicit arguments, not just one, and that there's no specific relation between a function and the type of its implicit arguments.
In a block of procedural code where several function calls need the same implicit arguments you can set them once and for all using the let
statement. PrintRecords(..)
, for instance, can be rewritten as follow:
Here :en_us
is used as a default value for the implicit argument locale
in all function calls that need it inside the body of the let statement. Any number of implicit arguments can be set in a single let statement, and of course let statements can be nested:
Memoization
Constants in Cell are automatically cached, even when they are the result of an arbitrary computation. Example:
The first time the value of const_ints_sum
is read the corresponding block of code is run, and its result is cached before being returned. All subsequent reads will just return the cached value.
Something similar happens with functions that have no positional arguments, only implicit ones. This is better explained with a (rather contrived) example:
In the above code computing the value of xs_sums
is relatively expensive, but during the entire execution of block_avgs(..)
the body of xs_sums
is run only once, and all subsequent calls to it just return its cached value. Memoization of the value of xs_sums
is made possible by the fact that it only depends on the implicit argument xs
, which never changes during the execution of block_avgs(..)
. It is only after xs
is either reassigned or goes out of scope that its memoized value is finally cleared from the cache. Without memoization it would be necessary to calculate xs_sums
once higher up in the call stack, and then pass it around either explicitly or implicitly.
What this means in practice is that when a certain piece of information depends only on the implicit arguments, and those arguments do not change during the execution of a certain block of code, you don't need to pass those additional pieces of information (like xs_sums
) around, but you can just calculate them inside a function with no positional arguments. This has the effect of reducing the amount of data that needs to be passed around, thereby making implicit arguments more effective.