Interfacing with C++

Note: the interface of the generated C++ code described here is already outdated and will be overhauled soon. If you want to get an idea of what this is going to be like a few months from now, have a look at the Java version.

When you compile a Cell program, the compiler does not generate a binary executable. Instead, it generates a text file containing code in the chosen output language. We'll examine the C++ code generator here. If you define a Main(..) procedure in you Cell code, the generated code can then be handed over to a C++ compiler to generate an executable file. That's how for instance the Cell compiler itself is built, and also the simplest way to build a program that tests your automata. But if you don't define a Main(..) procedure, the compiler will generate a set of classes, one for each type of automaton in your Cell code, that can be used to instantiate and manipulate the corresponding automata from your C++ code. The declarations of those classes will be contained in one of the two files generated by the compiler, generated.h, while the second file generated.cpp will contain the corresponding implementations. In this chapter we'll go through the interface of the generated classes and explain how to use them, and what each method is for.

The classes generated by the compiler are not supposed be used directly in your code: you should instead create a wrapper class for each of them, and use those instead. Having a level of indirection between your own C++ code and the generated one is useful for several reasons:

  • In order to pass data back and forth from the host language to Cell, you'll inevitably have to manually convert the native C++ data structures in the format that is accepted by the generated code and vice versa. You really don't want to repeatedly perform those conversions all over your codebase: it's much better to have all of them in just one place.
  • The interface of those generated classes is not yet stable, and it may change even radically with future versions of the compiler. If you hide them inside wrapper classes, all the changes can be dealt with in just one place.
  • The naming conventions of all the generated methods are rather ugly, as they are designed to avoid name clashes, not to be elegant. With the wrapper classes, on the other hand, you're free to use whatever naming convention you use in the rest of your program.

Data conversion

The most annoying and time-consuming part of using the classes generated by the Cell compiler from your own C++ code is of course converting data back and forth between the C++ and Cell representations. Let's focus on passing data from C++ to Cell first, since that's simpler. There's a number of simple Cell data types that are mapped directly to a corresponding C++ type. They are shown in the following table:

Cell C++
Int long long
Float double
Bool bool
String const char * (UTF-8)

Note that if you're passing in a string that contains characters that cannot be represented by 7-bit ascii, you'll need to encode them in UFT-8 format. When dealing with more complex data types that are not in the above table, you're expected to pass a string (that is, a const char * pointer) that contains the textual representation of a Cell value. That's neither elegant nor particularly efficient, but at least it's simple and straighforward.

The mapping is different, and more complex, when passing data back from Cell to C++. In this case it's of course not feasible to use the textual representation of values as an exchange format, because parsing those strings would be too much of a burden on the C++ side. So first of all the list of data types that have a direct mapping is longer:

Cell C++
Int long long
Float double
Bool bool
Symbol const char *
String std::string (UTF-8)
(T1, T2, ...) std::tuple<T1', T2', ...>
T* std::vector<T'>
[T] std::vector<T'>
[T1, T2] std::vector<std::tuple<T1', T2'>>
[T1, T2, T3] std::vector<std::tuple<T1', T2', T3'>>
any_tag(T) T'

The above table should be mostly self-explanatory, but there's a few things that need explaining:

  • Symbols are returned as pointers to strings containing their textual representation. For example, nothing is returned as a pointer to the C string "nothing". Those strings are kept in static memory, or something equivalent to it. DO NOT TRY TO DEALLOCATE THEM.
  • Strings are returned as std::string objects, again in UTF-8 format.
  • Aggregate types like sequences, tuples, sets and relations are mapped directly to the corresponding C++ data types only if the types of their elements have a direct mapping as well.
  • Tagged types can be mapped directly to a C++ type only if the tag is a known symbol and the type of the untagged value in turn has a direct mapping to a C++ type. In this case, the tag is simply discarded and the untagged value is returned.

The data types that cannot be mapped directly to C++ equivalents (the most important of which are records and union types) are returned as a pointer to an object of the cell::Value class whose declaration is included in the file generated.h. cell::Value is an abstract class, that contains only pure virtual methods and no member variables. Its declaration is shown here:

namespace cell {
  class Value {
  public:
    virtual bool is_symb() = 0;
    virtual bool is_int() = 0;
    virtual bool is_float() = 0;
    virtual bool is_seq() = 0;
    virtual bool is_set() = 0;
    virtual bool is_bin_rel() = 0;
    virtual bool is_tern_rel() = 0;
    virtual bool is_tagged() = 0;

    virtual bool is_string() = 0;
    virtual bool is_record() = 0;

    virtual const char *as_symb() = 0;
    virtual long long as_int() = 0;
    virtual double as_float() = 0;

    virtual unsigned int size() = 0;
    virtual Value *item(unsigned int) = 0;
    virtual void entry(unsigned int, Value *&, Value *&) = 0;
    virtual void entry(unsigned int, Value *&, Value *&, Value *&) = 0;

    virtual const char *tag() = 0;
    virtual Value *untagged() = 0;

    virtual std::string as_str() = 0;
    virtual Value *lookup(const char *) = 0;

    virtual std::string printed() = 0;
    virtual void print(std::ostream &os) = 0;
  };
}

This is the common superclass/interface of a number of concrete classes each of which is used to represent a particular type of Cell value: symbols, integers, floating point numbers, sequences, sets, binary and ternary relations and tagged values. These concrete classes are hidden from the user, and they can be manipulated only through their common "fat" interface, whose methods can be divided into three groups. The first one comprises all the bool is_*() methods, which are used to discover the type of the value represented by the target object. Then there's a group of methods that are used to actually access the data held by those objects:


  // Defined only for symbols. Return a pointer to a string
  // that contains the textual representation of the symbol
  // Do not try to deallocate the returned string
  const char *as_symb();

  // Defined only for integers
  // Returns the value as a 64-bit signed integer
  long long as_int();

  // Defined only for floating point numbers
  // Returns the value as a double precision floating point number
  double as_float();

  // Defined for all collection types:
  // sequences, sets, binary and ternary relations
  unsigned int size();

  // Defined only for sequences and sets
  // Returns the i-th value in the collection
  // In the case of sets, elements are arranged
  // in an implementation-defined order
  Value *item(unsigned int);

  // Defined only for binary relations
  // Returns the i-th pair in the relation
  // Pairs are arranged in an implementation-defined order
  void entry(unsigned int, Value *&, Value *&);

  // Defined only for ternary relations
  // Returns the i-th triple in the relation
  // Entries are arranged in an implementation-defined order
  void entry(unsigned int, Value *&, Value *&, Value *&);

  // Defined only for tagged values
  // Returns the textual representation of the tag
  // Again, do not try to deallocate the returned string
  const char *tag();

  // Defined only for tagged values
  // Returns the value without the tag
  Value *untagged();

  // Defined only for strings
  // Returns the string in UTF-8 format
  std::string as_str();

  // Defined only for records
  // Returns the value of the corresponding field
  // The only parameter is the textual representation of the
  // field symbol, e.g. point.lookup("x")
  Value *lookup(const char *);

Each of these methods are defined only for a subset of the subclasses of cell::Value, and if used with the wrong concrete class will just throw an exception. long long as_int(), for example, can only be used if the target object actually holds an integer value, which can be checked using the bool is_int() query methods.

The last two methods, string printed() and void print(std::ostream &os), are used to generate the textual representation of the value, which is returned as a string by the first and written to the provided ostream object by the second.

When the methods of automaton-derived generated classes need to return a cell::Value object they always do it using a std::unique_ptr<cell::Value> smart pointer. Once the smart pointer goes out of scope, the cell::Value object it points to is automatically deleted. If you get a pointer to one of its subobjects, using methods of the cell::Value interface like for instance Value *item(unsigned int), do not try to delete it. Those subobjects are "owned" by the object they belong to, and are automatically deleted along with it. It goes without saying that those pointers cannot be used anymore once the root object is deleted.

Relational automata

Let's take a look at the interface of the classes produced by the compilation of a relational automaton. We'll start with a very simple one, Counter:

namespace generated {
  class Counter {
  public:
    Counter();
    ~Counter();

    std::unique_ptr<cell::Value> read_state();
    void set_state(const char *);

    void execute(const char *);

    long long get_Value();
    long long get_Updates();

  private:
    void *ptr;
  };
}

As you can see, the generated C++ class has the same name of the Cell automaton it derives from, and is placed in the generated namespace. The first three methods, read_state(), set_state(..) and execute(..), are the same for all relational automata. All other methods are just accessors that are specific to a particular automaton, and can be used to read pieces of its state, or to invoke its methods.

The read_state() methods is the equivalent of the read instruction in Cell: it takes a snapshot of the state of the automaton and returns it as a cell::Value object. As explained earlier, once the unique_ptr object goes out of scope the returned object and all its subobjects are automatically deleted. Saving the state of an automaton to a file in text form can be done with the instruction instance.read_state().print(ofs);, where ofs is a valid instance of std::ofstream.

set_state(..) is used to set the state of an automaton instance, and is the equivalent of the write instruction in Cell. It can be used at any time in the life of the automaton instance, any number of times. The new state has to be provided in text form. Here's an example:

counter.set_state("(value: -10, updates: 0)");

If the provided state is not a valid one, set_state(..) will throw an exception. The generated code only throws long long values, which can of course be caught with a catch (long long e) {...} instruction. If the provided string cannot be parsed, the value thrown will specify the offset of the parsing error. If on the other hand the string encodes a valid Cell value but that value is not a valid state for the automaton in question the generated code will just throw 0LL. If the operation fails, the automaton instance will just retain the state it had before, and will still be perfectly functional.

execute(..) is used to send the automaton a message, which has to be passed in text form. A few examples:

counter.execute("incr");
counter.execute("decr");
counter.execute("reset");
counter.execute("reset(-1)");

Errors handling is the same as before. There's just one extra case to be aware of: if the argument encodes a valid message for the target automaton, but the message handler fails, execute(..) will throw the value 0LL. In any case, if the operation fails, the automaton will remain fully operational, and its state will be left untouched.

The last two methods, get_Value() and get_Updates(), are Counter-specific accessors that return the values of the value and update member variables respectively. Note that the types of such variables is just Int, which can be mapped directly to long long in C++, with no need to use cell:Value.

Let's now take a look at a more complex automaton, Supply. Here's the declaration of the generated C++ class:

class Supply {
public:
  Supply();
  ~Supply();

  std::unique_ptr<cell::Value> read_state();
  void set_state(const char *);
  void execute(const char *);

  long long get_Next_part_id();
  long long get_Next_supplier_id();

  bool in_Part(const char *);
  std::vector<long long> get_Part();

  bool in_Supplier(const char *);
  std::vector<long long> get_Supplier();

  bool in_Code(const char *, const char *);
  std::vector<std::tuple<long long, std::string>> get_Code();

  bool in_Phone(const char *, const char *);
  std::vector<std::tuple<long long, std::string>> get_Phone();

  bool in_Sells(const char *, const char *);
  std::vector<std::tuple<long long, long long>> get_Sells();

  bool in_Name(const char *, const char *);
  bool lookup_Name(const char *, std::string &);
  std::vector<std::tuple<long long, std::string>> get_Name();

  bool in_Address(const char *, const char *);
  bool lookup_Address(const char *, std::string &);
  std::vector<std::tuple<long long, std::string>> get_Address();

  bool in_Description(const char *, const char *);
  bool lookup_Description(const char *, std::string &);
  std::vector<std::tuple<long long, std::string>> get_Description();

  bool in_Availability(const char *, const char *, long long);
  std::vector<std::tuple<long long, long long, long long>> get_Availability();

  bool in_Unit_price(const char *, const char *, const char *);
  std::vector<std::tuple<long long, long long, long long>> get_Unit_price();

  std::unique_ptr<cell::Value> call_Lowest_price_suppliers();
  std::vector<long long> call_Lowest_price_suppliers(const char *);

private:
  void *ptr;
};

The first three methods of the Supply class, read_state(), set_state(..) and execute(..), are the same as before. The next two, get_Next_part_id() and get_Next_supplier_id(), are just accessors for the corresponding member variables, just like get_Value() and get_Updates() in Counter. All the other methods are new, and are either accessor for some mutable relation variable or a wrapper for a Cell method. A first set of methods, of the form bool in_*(..), checks whether a relation contains a given tuple (or value, for unary relations):

// Checks whether the part unary relation
// contains the value part_id(1)
supply_instance.in_Part("part_id(1)")

// Checks whether the sells binary relation contains
// the pair supplier_id(8), part_id(2)
supply_instance.in_Sells("supplier_id(8)", "part_id(2)")

// Checks whether availability contains the
// triple supplier_id(7), part_id(3), 25
supply_instance.in_Availability("supplier_id(7)", "part_id(3)", 25)

A second group of methods, like get_Supplier() or get_Unit_price(), return the entire content of a given relations, as a vector<tuple<..>> (or just vector<..> in the case of unary relations).

Binary relations with a key on the first column (that is, maps) also have accessors that return the value corresponding to a given key, provided that the relation/map contains such key, and returns a boolean value that indicates whether the lookup was successful:

// Looks up the name of the supplier identified by
// the value supplier_id(21) and stores it in
// the variable name, if such a supplier actually exists.
// Returns true if the supplier was found and the lookup
// was successful, false otherwise
string name;
bool found = supply_instance.lookup_Name("supplier_id(21)", name);

The last group of methods (call_Lowest_price_suppliers(..)) are just the compiled C++ version of the corresponding Cell methods of the Supply automaton.

Reactive automata

We'll use Switch automaton as our first example. This is the interface of the corresponding generated class:

namespace generated {
  class Switch {
  public:
    enum Input {SWITCH_ON, SWITCH_OFF};

    enum Output {IS_ON};

    Switch();
    ~Switch();

    void set_input(Input input, const char *value);
    void read_output(Output output, char *buffer, unsigned int size);

    void apply();

    std::unique_ptr<cell::Value> read_state();
    void set_state(const char *buffer);

    unsigned int changed_outputs_count();
    Output changed_output_id(unsigned int idx);

    void set_Switch_on(bool);
    void set_Switch_off(bool);

    bool get_Is_on();

  private:
    void *ptr;
  };
}

The first thing to note here is the two enumerations Input and Output, whose elements are the uppercase version of the names of the inputs and outputs of Switch. These are used in conjunction with the methods set_input() and read_output() as shown here:

// Setting the value of the two inputs
switch.set_input(Switch::SWITCH_ON, "true");
switch.set_input(Switch::SWITCH_OFF, "false");

// Propagating the changes to the inputs
// throughout the automaton instance
switch.apply();

// Reading and printing the value of the only output
char text[16];
switch.read_output(Switch::IS_ON, text, 16);
printf("is_on = %s\n", text);

As an alternative to set_input(..) and read_output(..), which can operate on any input or output and use the textual representation of values as an exchange format, the generated class also provides another set of methods each of which can manipulate a single input or output, but that are more convenient to use in most cases. The above code snippet can be rewritten as follow:

// Setting the value of the two inputs
switch.set_Switch_on(true);
switch.set_Switch_off(false);

// Propagating the changes to the inputs
// throughout the automaton instance
switch.apply();

// Reading and printing the value of the only output
bool is_on = switch.get_Is_on();
printf("is_on = %s\n", is_on ? "true" : "false");

The read_state() and set_state(..) methods work in the same way as with relational automata, but with the limitations we've already discussed for time-aware automata. The last two methods, changed_outputs_count() and changed_output_id(..) provide you with a list of outputs that have changed (or have been active, in the case of discrete outputs) as a result of the last call to apply():

// Changing inputs here
...

// Propagating those changes
switch.apply();

// Iterating through the outputs that have changed
// if countinuous or have been activated if discrete
int count = switch.changed_outputs_count();
for (int i=0 ; i < count ; i++) {
  // Retrieving the id of the i-th output that
  // has changed or activated
  Switch::Output id = switch.changed_output_id(i);

  // Reading the value of the changed output
  char buffer[256];
  switch.read_output(id, buffer, 256);

  // Now time to do something with the value of the output
  ...
}

The last thing we need to see is how to deal with time-aware automata. We'll use WaterSensor:

namespace generated {
  class WaterSensor {
  public:
    enum Input {RAW_READING};

    enum Output {SENSOR_STATE};

    WaterSensor();
    ~WaterSensor();

    void set_input(Input input, const char *value);
    void read_output(Output output, char *buffer, unsigned int size);

    void set_elapsed_millisecs(unsigned long long time);
    void set_elapsed_secs(unsigned long long time);

    bool apply();

    std::unique_ptr<cell::Value> read_state();
    void set_state(const char *buffer);

    unsigned int changed_outputs_count();
    Output changed_output_id(unsigned int idx);

    std::unique_ptr<cell::Value> get_Sensor_state();

  private:
    void *ptr;
  };

The only differences here, apart from the input setters and output getters which are obviously specific to each automaton type, are the two extra methods set_elapsed_secs(..) and set_elapsed_millisecs(..) and the fact that apply() now returns a boolean value. The former are the equivalent of the elapsed instruction in Cell, and the value now returned by apply() has the same meaning as the one returned by the apply instruction in a Cell procedure. Here's an example of how to update an instance of WaterSensor:

// Updating the values of the inputs here
...

// Setting the amount of time that has elapsed
// since the last call to water_sensor.apply()
water_sensor.set_elapsed_millisecs(100);

do {
  // Repeatedly calling apply() until it returns true
  // That happens only once all pending timers have
  // been processed and the changes in the values of
  // the inputs propagated throughout the automaton
  bool done = water_sensor.apply();

  // Iterating through the outputs that have changed
  // if countinuous or have been activated if discrete
  int count = water_sensor.changed_outputs_count();
  for (int i=0 ; i < count ; i++) {
    // Retrieving the id of the i-th output that
    // has changed or activated
    WaterSensor::Output id = water_sensor.changed_output_id(i);

    // Reading the value of the changed output
    char buffer[64];
    water_sensor.read_output(id, buffer, 64);

    // Now time to do something with the value of the output
    ...
  }
} while (!done);