A comparison with OOP

Note: if you haven't already, read the introductory example before reading this.

After writing the core of his application in Cell, Miki has now decided to write it again in C#, in order to get a better sense of the differences between the two programming paradigms, OOP and the functional/relational approach used in Cell. Miki is not trying here to replicate more "advanced" features of Cell, like orthogonal persistence, transactions, or the ability to record and replay the execution of the application. He just wants to write some of the logic in both paradigms to see how they compare in terms of the usual metrics that we value as software engineers, things like simplicity, reliability, maintainability, overall development effort and so on.

On the Cell side, Miki has decided to make use of a number of previously discussed features that are not implemented in the current (0.1.1) version of the compiler, but which will be available starting with version 0.3.

On the C# side, in order to avoid boring boilerplate code that would make the comparison less clear, Miki has decided to only partially follow one of the fundamental tenets of OOP, encapsulation. All member variables will be public, but they are intended to be read-only outside the class they belong to. Any change to the state of any object will only be carried out by methods of its class. This is similar to what happens in Cell with relational automata, whose state is visible in read-only mode to the outside world, but can be mutated only by the message handlers of the same automaton, although there are several major differences here, one of them being the fact that in Cell that's actually enforced by the language, while in C# it is left to the programmer's self-discipline.

The other rule that Miki has set for himself is that no linear searches are allowed in his code. His application is of course managing only a small amount of data, and he could probably get away with linear searches, but he has decided to write it as if it were a real, industrial-scale one. In the Cell version of the software of course all searches and lookups are performed in constant time, courtesy of the relational model.

Data structures

We've already seen the Cell schema for the core logic of Miki's application:

type Person = person(Int);
type Gang   = gang(Int);

schema SocialNetwork {
  next_id : Int = 0;

  person(Person):
    name           : String,
    surname        : String,
    nickname       : String [unique],
    date_of_birth? : Date;

  know_each_other(Person | Person):
    met_on          : Date,
    introduced_by?  : Person;

  gang(Gang):
    name    : String,
    leader  : Person;

  member(Person, Gang):
    joined_on : Date,
    rank      : Int;
}

There are a couple minor differences here, due to the fact that we are using the yet unreleased version 0.3 of the language. There's syntactic sugar for the unique nickname attribute, and the know_each_other relation and its attribute relations, met_on and introduced_by, have been declared as symmetric.

There are of course many ways to design a set of equivalent data structures in C# (unlike what happens with Cell, where's there's just one obvious way to design the data schema), but this is what Miki has settled on:

partial class Person {
  public int      id;
  public string   name;
  public string   surname;
  public string   nickname;
  public DateTime dateOfBirth;

  // The dictionary key is the id of the gang
  public Dictionary<int, Membership> memberships;

  // The dictionary key is the id of the other person
  public Dictionary<int, Friendship> friendships;
}

partial class Gang {
  public int    id;
  public string name;
  public Person leader;

  // The dictionary key is the id of the member
  public Dictionary<int, Membership> memberships;
}

partial class Membership {
  public Person   person;
  public Gang     gang;
  public DateTime joinedOn;
  public int      rank;
}

partial class Friendship {
  public Person   person1;
  public Person   person2;
  public DateTime metOn;
  public Person   introducedBy;
}

partial class SocialNetwork {
  int nextId = 0;

  // The dictionary key is the id of the person
  public Dictionary<int, Person> people;

  // The dictionary key is the id of the gang
  public Dictionary<int, Gang>   gangs;
}

There are a couple issues that Miki has noticed with the C# version of the schema. The first one is that there's some redundancy in the data. The most obvious part of it is the memberships member variables in Person and Gang: either one could be eliminated without actually removing any information from the data set. The redundancy is needed because we have to be able to efficiently retrieve both the list of gangs a given person has joined (along with the membership data, like join date and rank) and the list of all members of a given gang. But there's another, more subtle, source of redundancy: all the Dictionary<> member variables contain a list of key/value pairs where the value is an object of one of the classes in Miki's model, and the key is either the id of the object itself or that of another object linked from it. That is, the id of the entity that is used as key is stored twice, first in the object itself and second as the key of the dictionary.

There's two main problems that Miki sees with that. The first one is that if some piece of information is stored in multiple places, every time he updates it Miki has to update each of the locations that contain it, and that means writing more code. The other problem has to do with consistency: a bug in the code could cause redundant data structures to go out of sync, and that would almost surely cause the software to misbehave. One could end up for example in a situation where a Person "thinks" it has joined a Gang, but the corresponding Gang object doesn't have such person among its members. Miki also feels that the redundancy makes his data structures less clear and more difficult to understand.

The second issue that bothers Miki is the fact that in C# (and more generally, in OOP) there's no way to enforce a number of useful integrity constraints on the data. Nothing would prevent buggy code from, for example, creating two different Membership objects for the same combination of person and gang, with the result that a person could end up having two different join dates and ranks in the same gang. Similar problems could occur with Friendship objects. And of course there's no way to declaratively enforce the fact that nicknames have to be unique.

All these problems are easily avoided using the relational model. In OOP the only way to make sure that that doesn't happen is to write bug-free code. But code is complex and difficult to write, and will invariably contain bugs in any non-trivial application, while redundance-free data structures and declarative integrity constraints are very easy to design and get right.

Updating the information in the dataset

The next thing that Miki is going to do is write methods that are equivalent to the Cell message handlers. Inserting a new person or gang is trivial and very similar in both languages, so Miki will not even bother to show you the code for that. Instead, he will start with the code that records the fact that a person has joined a gang, or that two people have met. Here's the Cell code for that:

SocialNetwork.have_met(person_1: Person, person_2: Person, day: Date, introduced_by: Person?) {
  insert know_each_other(
    self.person_1 | self.person_2,
    met_on = self.day,
    introduced_by = self.introduced_by if self.introduced_by?
  );
}

SocialNetwork.has_joined(person: Person, gang: Gang, date: Date, rank: Int); {
  insert member(self.person, self.gang, joined_on = self.date, rank = self.rank);
}

Here too Miki's code is making use of some syntactic sugar that will only be available with version 0.3 of Cell. In C# the same functionality is implemented by the following methods of the Person class:

partial class Person {
  public void Join(Gang gang, DateTime joinedOn, int rank) {
    Membership membership = new Membership(this, gang, joinedOn, rank);
    memberships[gang.id] = membership;
    gang.memberships[id] = membership;
  }

  public void MakeFriendsWith(Person otherPerson, DateTime metOn, Person introducedBy) {
    Friendship friendship = new Friendship(this, otherPerson, metOn, introducedBy);
    friendships[otherPerson.id] = friendship;
    otherPerson.friendships[id] = friendship;
  }
}

The C# code is slightly more complex, in the case of Join(..) because the same Membership object has to be inserted in both Person.memberships and Gang.memberships, and in the case of MakeFriendsWith(..) because the same Friendship object has to be added to the friendships member variables of both parties involved. But that's not much of deal.

The code that deletes data from the dataset, though, is more interesting. Here's the Cell version:

SocialNetwork.delete_person(id: Person) {
  id = self.id;
  delete person(id), know_each_other(id | *);
  leave_gang(id, g) : g <- member(id, ?);
}

SocialNetwork.delete_gang(id: Gang) {
  delete gang(self.id), member(*, self.id);
}

SocialNetwork.leave_gang(person: Person, gang: Gang) {
  leave_gang(self.person, self.gang);
}

using SocialNetwork {
  leave_gang(Person person, Gang gang) {
    delete member(person, gang);
    if leader(gang) == person:
      members = [p : p <- member(?, gang), p != person];
      if members == []:
        delete gang(gang);
      else
        max_rank_members = max_by(members, rank($, g));
        new_leader = an_elem(max_rank_members);
        update leader(gang, new_leader);
      ;
    ;
  }
}

The above code makes use of both features that are already available but we haven't encountered before and others that will only be available with version 0.3 . For the equivalent C# code Miki has added a Delete() methods to both Person and Gang:

partial class Person {
  public SocialNetwork socialNetwork;

  public void Delete() {
    socialNetwork.Remove(this);

    foreach (var e in memberships)
      e.Value.gang.RemoveMember(this);

    foreach (var e in friendships) {
      Friendship f = e.Value;
      Person otherPerson = f.person1 == this ? f.person2 : f.person1;
      otherPerson.friendships.Remove(id);
    }
  }

  public void LeaveGang(Gang gang) {
    if (memberships.Remove(gang.id))
      gang.RemoveMember(this);
  }
}

partial class Gang {
  public SocialNetwork socialNetwork;

  public void Delete() {
    socialNetwork.Remove(this);

    foreach (var e in memberships)
      e.Value.person.LeaveGang(this);
  }

  public void RemoveMember(Person person) {
    if (!memberships.Remove(person.id)) {
      person.LeaveGang(this);
      if (leader == person) {
        Person newLeader = null;
        int newLeaderRank = 0;
        foreach (var e in memberships) {
          Membership m = e.Value;
          if (newLeader == null || newLeaderRank < m.rank) {
            newLeader = m.person;
            newLeaderRank = m.rank;
          }
        }
        if (newLeader != null)
          leader = newLeader;
        else
          Delete();
      }
    }
  }
}

partial class SocialNetwork {
  public void Remove(Person person) {
    if (people.Remove(person.id))
      person.Delete();
  }

  public void Remove(Gang gang) {
    if (gangs.Remove(gang.id))
      gang.Delete();
  }
}

Here the problems created by a bad data representation are more evident. The first thing Miki has to do is to add new "wiring" to his class model: both Person and Gang now need to have a pointer to SocialNetwork, since their Delete() methods need to remove the corresponding objects from the list of "live" objects that is held by SocialNetwork. That means adding a new member variable to both, changing both constructors and all places where objects of either class are instantiated. In a toy application like this it may not be much of a problem, but Miki has already enough experience of real-world software development to be familiar with this situation: implementing a new feature causes the classes/objects where the logic is located to need access to other nodes of the object graph, which has therefore to be augmented with some extra wiring. Depending on how "distant" the two parts of the graph are, many places in the codebase may have to be modified in order to "carry" a pointer to the required data structures from the place where it's already available to the one where it's now needed.

As was the case with the Join(..) and MakeFriendsWith(..) methods we saw before, the Delete() methods have more work to do because of the redundancy in the data: Person.Delete() has to leave all the gangs the person has joined, and remove the other side of each friendship, in orders to remove all traces of itself, while Gang.Delete() has to make all members of the gang leave it. And both of them need to remove the objects they belong to from SocialNetwork as we mentioned earlier.

Another thing that Miki feels is making the code more difficult to understand and less elegant is the fact that there's a mutual dependency between LeaveGang(..) and RemoveMember(..): each of them needs to call the other in order to complete its work, and therefore there has to be some check in place to avoid entering an infinite loop.

The same type of cyclical dependency in repeated with the Delete() methods and the corresponding Remove(..) methods in SocialNetwork. If those methods were simply defined as follow:

partial class SocialNetwork {
  public void Remove(Person person) {
    people.Remove(person.id)
  }

  public void Remove(Gang gang) {
    gangs.Remove(gang.id);
  }
}

they would be unsafe, because if they were erroneously called on their own by the user of Miki's classes (which could well happen since they are public) they would cause the dataset to enter an inconsistent state: the person or gang in question would be removed from the list of objects held by SocialNetwork, but would still be reachable from all the Membership and Friendship objects that reference them. Miki was even tempted to do away with the two Remove(..) methods in SocialNetwork altogether, and let Person and Gang remove themselves directly from SocialNetwork (using the instructions socialNetwork.people.Remove(id) and socialNetwork.gangs.Remove(id) respectively), but that would violate the rule he had set for himself at the beginning of the project: all changes to the state of an object can only be carried out by the methods of its class. Violating the principle of encapsulation like that would cause other problems further down the road, as we'll see in the next paragraph.

Searches

Miki is now going to implement one of the (read-only) methods that we saw previously, people_born_on(..):

using SocialNetwork {
  [PersonId] people_born_on(Date d) = [p : p <- date_of_birth(?, d)];
}

This is what Miki came up with:

partial class SocialNetwork {
  Dictionary<DateTime, Dictionary<int, Person>> peopleByDateOfBirth =
    new Dictionary<DateTime, Dictionary<int, Person>>();

  public Person[] PeopleBornOn(DateTime date) {
    if (peopleByDateOfBirth.ContainsKey(date))
      return peopleByDateOfBirth[date].Values.ToArray();
    else
      return new Person[0];
  }

  public void Add(Person person) {
    people[person.id] = person;
    if (!peopleByDateOfBirth.ContainsKey(person.dateOfBirth)) // NEW
      peopleByDateOfBirth[person.dateOfBirth] = new Dictionary<int, Person>(); // NEW
    peopleByDateOfBirth[person.dateOfBirth][person.id] = person; // NEW
  }

  public void Remove(Person person) {
    if (people.Remove(person.id)) {
      person.Delete();
      peopleByDateOfBirth[person.dateOfBirth].Remove(person.id); // NEW
    }
  }
}

The PeopleBornOn(..) method and the peopleByDateOfBirth member variable are new, while Add(..) and Remove(..) where already there, but had to be modified to support the new functionality. The lines of code that have been added to them are marked by the // NEW comment (The Add(...) methods was not shown before, as it was a trivial one-liner).

Note that if Miki had decided to violate the encapsulation of SocialNetwork in the previous paragraph, it would have been easier for him to miss the fact that his code needed to be modified after the addition of peopleByDateOfBirth.

What Miki has learned so far...

Miki could of course have designed his C# data structures differently, and he actually had to choose between a number of possible designs none of which was clearly superior to the others, but each of which involved a number of trade-offs, and Miki is not even sure he chose the best one for his application (He is, by the way, open to suggestions. If you see a way to improve his code, leave a message in the forum). This stand in sharp contrast with what happened when writing the Cell schema. In that case, there was a single, obvious way to design it, and Miki is positive that if ten different developers had been given the same informal specifications to implement they would all have come up with essentially the same data schema.

Furthermore, the Cell schema did not need any changes to its initial version, unlike the C# data structures which required several additions that were discovered only in a later stage of the application development, and which involved several code changes that were spread out all over the code base. None of these additions actually added any information to the dataset, their only purpose was either to allow the code to navigate it, or to avoid performing unnecessarily slow searches.

Miki already appreciated the advantages offered by the relational model, but one thing he was not aware of previously is that relational data structures seem to be intrinsically more "stable" that their counterparts in low-level programming paradigms like OOP, and he suspects that might actually be a non-trivial advantage for software development, since every change to the data structures invariably involves changes to the actual code.

Miki is not done yet with his software development experiments. He's going to try again soon with a larger application. Stay tuned.