News, examples, tips, ideas and plans.
Thoughts around ORM, .NET and SQL databases.

Thursday, September 03, 2009

Materializing entities with unknown type: the DO4 way

As you know, I'm reading Oren Eini's blog. His today's post is about tricky case related to entity materialization (my comment starts here). I recommend you to read the whole post before continuing :)

Ok, let's think you already know the problem. Now I'll explain how DO handles this:

1. DO always returns correct type of any Entity. I.e. you will never get an Animal instead of Dog. So if type of some Entity isn't known (below I describe what this mean), but we must return its instance, we'll go to the database for it. Such a query will fetch all non-lazy fields of this Entity in addition to its type.

2. Our LINQ query translator is designed to pull the informational about type of any Entity or Key you're going to materialize through the whole query. Currently this information is stored in TypeId field, but later we'll allow you to use custom type discriminators.

3. We maintain LRU cache of Key objects. Each key caches the type of Entity it belongs to. Default size of this cache is 16K Keys, and it contains only those Keys which types are precisely known (but we don't cache Keys of hierarchies consisting of just a single type). So in fact, we cache ~ 16K types of entities we used most recently. This means we can materialize the entity without its type lookup on subsequent access attempts - even in different Sessions.

4. Key comparison (Key.Equals, etc.) is optimized for presence of such cache. Different Key objects may correspond to the same key, but since some Keys can be cached, there can be a single instance of such keys shared across multiple Sessions. So first we compare keys by references, then we compare their cached hash codes and only after this we compare the content. In fact, Key comparison happens quite fast.

5. As you might see, we imply type of any Entity can't be changed during application lifetime. The same is correct for .NET objects, as well as for objects in many other languages. Although this is not always correct for objects stored in databases :) The only way to properly handle entity type change in our case is to clear domain key cache. I think this is acceptable, because this is either what really never happens, or happens quite rarely.

That's it ;)


  1. In other words, you are forcing the application to load a lot more data than it needs to

  2. Yes. If developer doesn't leave framework a choice (it wants to get the instance), this will happen. I think it's better to return an expected object instead of unexpected one + we did a lot to make this happen as rarely as it's possible.

    Btw, upcoming Query.Prefetch API will allow to gracefully eliminate even worse ones. We just finished its design, so shortly I'll decribe it.

  3. I just got few more questions: what will happen if type is known, but entity state isn't loaded yet?

    So the full answer is:
    - There will be a single query to hierarchy root table fetching all non-lazy fields, if type isn't known. So only root fields will be fetched in this case.
    - There will be a single query to joined sequence of inherited tables for this type fetching all non-lazy fields, if type is precisely known. So all non-lazy fields will be loaded in this case.
    - If you don't need to load animalLover.Animal at all, but need only its key, you must use protected Persistent.GetReferenceKey method. DO4 uses it everywhere as well.
    - Of course, writing animalLover.Animal.Key won't lead to Animal fetch in query.

    So such a behavior is "by design" here as well ;) And actually I think it is correct. Imagine, how perfectly this will work if there is global Velocity-like cache.