Sunday, 16 May 2010

Implementing Class Inheritance in Windows Azure Table Storage



Background

One of the requirements for our .NET application, which is currently being built using Windows Azure, is the ability to store objects in the cloud and later restore them into the correct data types during retrieval. This might seem obvious, but here comes the restriction: the objects are of different subclasses (but have the same super class), have to be stored in the same table and they have to be retrieved through LINQ using the same data context. Moreover, the entities (as they are called) should be updatable using the same data context.
Consider the following example. We have two classes Animal and Dog. Dog inherits from Animal. The structure of the classes is shown in Figure 1. We want to be able to store objects of both types in a table called 'Animals'.
Figure 1 Structure of Animal and Dog Classes
Note that the Animal class has a virtual (C#) method called Talk () that returns an empty string. Dog class overrides this method and returns 'Woof Woof'. Additionally, dog is able to sit down on command when you invoke 'DoTrick ()' method on it. We should be able to do the following and if there are any dogs, we should get 'Woof Woof'.

AnimalModel model = new AnimalModel();
List<Animal> animals = model.GetAnimals(5);
foreach (Animal animal in animals) {
{
    Console.WriteLine("{0} : {1}", animal.Name, animal.Talk());
}

We want to be able to store how the particular animal makes noise in their respective classes, and as we saw above, if the animal is a dog, we want it to be able to do some tricks. There could have been another sub-class of Animal, for example a cat, and without changing the code, we should get 'Mew mew' on calling the animal.Talk() method.


The Problem

Using the StorageClient helper library provided with Windows Azure SDK version 1.0, we can easily create an AnimalContext class that extends TableStorageContext as follows.

public class AnimalContext : TableStorageDataServiceContext
{
    public AnimalContext(StorageAccountInfo sai) : base(sai)
    {
    }

    public string TableName
    {
        get { return "Animals"; }
    }

    public IQueryable<Animal> Animals
    {
         get { return this.CreateQuery<Animal>(this.TableName); }
    }
}

The class is now equipped with basic CRUD operations on the table. We use the repository pattern and wrap the AnimalContext class within AnimalModel class like so.

public class AnimalModel
{
 private AnimalContext animalContext;
 private StorageAccountInfo storageAccountInfo;
 
 public AnimalModel(StorageAccountInfo sai)
 {
  this.storageAccountInfo = sai;
  this.animalContext = new AnimalContext(sai);
 }

 public List<Animal> GetAnimals(int count)
 {
  List<Animal> animals = this.animalContext.Animal.Take(count).ToList();
  return animals;
 }
}

However, even if we store an instance of Dog in the table, GetAnimals() method always returns an Animal. Any additional information pertaining to dogs, i.e. Breed, is lost. Let's revisit the code snippet that we saw in earlier section.

AnimalModel model = new AnimalModel();
List<Animal> animals = model.GetAnimals(5);

foreach (Animal animal in animals)
{
 Console.WriteLine("{0} : {1}", animal.Name, animal.Talk());
}

Let's say we create five dogs and retrieve them using the code above. Although we expect five Dogs to be retrieved and hence the animal.Talk() method to return 'Woof Woof', we just get an empty string. This behaviour is due to the fact that query responses from Windows Azure Table Storage do not contain CLR data-type information in them.

The Solution

Windows Azure Table has a flexible schema, so entities stored in a table can have any structure. The only restriction is that they need to have the following properties: PartitionKey, RowKey and optionally a TimeStamp. We add a fourth property to our class: Type.
When we query for an animal from the table storage, we get an XML document in the form of an Atom feed. We are going to have to deserialize the XML ourselves, and since we want the XML serializer for Animal class to know about the Dog type, we decorate the Animal class with an XmlIncludeAttribute. The class now looks like following:

public class Animal
{
    public string PartitionKey { get; set; }

    public string RowKey { get; set; }

    public DateTime TimeStamp { get; set; }

    public string Type
    {   
        get { return this.GetType().Name; }
        set { Debug.Assert(value == this.getType.Name); }
    }   

    public string Name { get; set; }

    public int NoOfLegs { get; set; }

    public virtual string Talk() 
    {   
        return String.Empty;
    }   
}     

Notice the Debug.Assert statement inside the setter for Type property. We do this because the serializer requires that all the properties of our class should be publicly settable, but we don't want to create confusion if the calling code sets this property to something other than the type of that instance.


We now create a GenericEntity class. It is a generic class that has three properties common to every entity retrieved from Windows Azure table storage: PartitionKey, RowKey and Etag. It also contains a dictionary called 'Properties', which stores all additional properties that a table storage entity might have. The name and value of the entity properties form the key-value-pairs.

The GenericEntity class has a ToXml(string rootXmlElement) method that constructs an XML string using the properties from the 'Properties' dictionary. Additionally, if there is a property named 'Type',
the
method maps its value to the xsi:type attribute of the resulting XML. An XML de-serializer can then easily convert the string into the appropriate type. This class can be used in conjunction with any query response from Azure Table Storage. Please refer to the source code for full listing of the GenericEntity class.
TableStorageDataServiceContext class, which is the base class for our AnimalContext class, raises ReadingEntity event for each entity after a query response is received. At this point we have access to the XLinq element representing the Atom entry from our query response. Using this extensibility point, we can parse and store all entity properties inside the 'Properties' dictionary within a GenericEntity object. By calling ToXml() method later on the GenericEntity object, we can serialize the table storage entity into an XML that an XMLSerializer can parse. The following snippet shows how we can access the GenericEntity object within the event handler.


In our 'animals' example, we would need to attach event handler to the ReadingEntity event. We do that inside the constructor of the AnimalModel class like so:

When we are retrieving animals, we first retrieve them as instances of GenericEntity. As previously mentioned, the event handler has access to all entity properties when the animals are being retrieved. The handler reads off the properties of the animal into the 'Properties' dictionary. Since our Animal class has a property
named 'Type', we can call ToXml("Animal") on each of the instances of GenericEntity and serialize them to string. Using an XMLSerializer(typeof(Animal)), we can then de-serialize the XML into the correct type of Animal.
When we are retrieving a dog, the value of the xsi:type attribute of the serialized string would be 'Dog'. Hence the de-serializer converts our generic entity into the correct type – a Dog.
To conclude our discussion, let's look how the new GetAnimals (int) method now looks like:

public List<Animal> GetAnimals(int count)                                       
{
    List<Animal> animals = new List<Animal>();

    //Construct a new DataServiceContext
    TableStorage storage = TableStorage.Create(this.storageAccountInfo);
    TableStorageDataServiceContext svc = storage.GetDataServiceContext();
    svc.ReadingEntity += new EventHandler<ReadingWritingEntityEventArgs>
        (GenericEntity.OnReadingEntity);

    //Create the query using GenericEntity
    var qResult2 = (from c in
                    svc.CreateQuery<GenericEntity>(this.animalContext.TableName)
                    select c).Take(count);

    //Execute it
    TableStorageDataServiceQuery<GenericEntity> tableStorageQuery =
        new TableStorageDataServiceQuery<GenericEntity>(qResult2 as
            DataServiceQuery<GenericEntity>);
    IEnumerable<GenericEntity> res = tableStorageQuery.Execute();

    foreach(GenericEntity entity in res)
    {
        //Deserialize the GenericEntity into an animal
        Animal animal = Animal.XmlDeserialize(entity.ToXml("Animal"));
        animals.add(animal);
    }

    return animals;
}

Finally, if the animal needs to be updated, it has to be explicitly attached to the data service context because we created the animal object ourselves. We use the Etag of our entity for that purpose. Add the following line inside the foreach loop after the animal has been deserialized.

//Re-associate the newly created entity using the appropriate ETAG
//This is important when updating
this.animalContext.Detach(animal);
this.animalContext.AttachTo(this.animalContext.TableName, animal, entity.Etag); 


Please refer to the attached source code for more details.

Drawbacks

  • There is an added overhead involved as we are, for each object, altering the XML document received from Azure, and then de-serializing it ourselves.
  • There is an extra 'Type' XML element in the data exchanged between the server and the client.
  • Since MS SQL Express does not support flexible schema, the same code cannot be tested against the local storage within the Dev-Fabric.
  • As the super class needs to be decorated with 'XmlIncludeAttribute', there is tight coupling between the classes. It might not be a workable solution if source code for the assembly containing the base classes is not available.
  • This solution would not work if we wanted our Animal class (from the example above) to be abstract because it could not have been serialized.

Code

Coming soon! (as soon as I find a nice place to upload my files, maybe Azure/App engine blob storage)

Acknowledgements

  • Windows Azure Tables: Programming Cloud Table Storage -- Microsoft PDC 2008 presentation by Niranjan Nilakantan and Pablo Castro
  • Azure Storage Explorer source code [http://azurestorageexplorer.codeplex.com/]