The article is interesting itself, but I'd like to highlight some differences related to support of TPT ("table per type" or "table per class") inheritance mapping in DataObjects.Net:
1. EF uses "greedy" queries joining all the tables in hierarchy. DO joins minimal set of tables necessary to run the query.
EF is not initially designed for wide usage of lazy loading, so it should fetch the whole entity state, and thus - join all the tables in hierarchy.
DO behaves differently: when you query for Dog type in Animal hierarchy, it will join only Animal, Mammal and Dog tables. That's exactly what's necessary to fetch Dog type properties. If actual instances are descendants of Dog declaring their own fields, we imply you'll use .Prefetch to additionally fetch their own fields, if this is necessary; otherwise, we'll anyway load them on demand, but this will lead to SELECT N+1 issue.
.Prefetch API, in fact, fetches entity states individually - by splitting entities to process into groups of entities having the same type, and fetching their state by batched queries having Id IN (@Id1, @Id2, ...) in their WHERE clause (or its equivalent with (... OR ... OR ...) for composite keys). We've chosen this approach, because it allows us to:
That's why I constantly write EF simply can't be used in case you have large inheritance hierarchy: 16 types = 15 JOINs in any query to this hierarchy.
What about NHibernate?
NHibernate seems more clever here (in comparison to EF ;) ): this post by Ayende shows it uses LEFT OUTER JOIN only when you query for non-leaf type; otherwise, it uses INNER JOIN. So:
EF is not initially designed for wide usage of lazy loading, so it should fetch the whole entity state, and thus - join all the tables in hierarchy.
DO behaves differently: when you query for Dog type in Animal hierarchy, it will join only Animal, Mammal and Dog tables. That's exactly what's necessary to fetch Dog type properties. If actual instances are descendants of Dog declaring their own fields, we imply you'll use .Prefetch to additionally fetch their own fields, if this is necessary; otherwise, we'll anyway load them on demand, but this will lead to SELECT N+1 issue.
.Prefetch API, in fact, fetches entity states individually - by splitting entities to process into groups of entities having the same type, and fetching their state by batched queries having Id IN (@Id1, @Id2, ...) in their WHERE clause (or its equivalent with (... OR ... OR ...) for composite keys). We've chosen this approach, because it allows us to:
- Ignore the entities with already cached state - i.e. if they were passed to .Prefetch, nothing will happen.
- Fetch the state from global cache first, when it will appear. So .Prefetch won't load the database server at all, if state of all the entities is cached in global cache.
- Issue "ideal" SQL queries with quite predictable cost: any SQL query sent by .Prefetch is, in fact, translated to a set of primary index seek operations.
- .Prefetch queries are batched. Technically we can fetch pretty large number of entities by each of such batch (up to ~ 1000K entities per batch because of SQL Server query parameter limit), but actually we fetch 256 entities in each batch. Such limit is chosen because it's optimal: that's anyway a bit more than we can materialize during execution of such batch, on the other hand, queries we send are shorter, and thus simpler for optimizer.
- 5! = 120
- 10! = 3 628 800
- 15! = 1 307 674 368 000
- 20! = 2.43290201 × 10^18
That's why I constantly write EF simply can't be used in case you have large inheritance hierarchy: 16 types = 15 JOINs in any query to this hierarchy.
2. EF uses LEFT OUTER JOINs to join the tables in hierarchy. DO always uses INNER JOIN to do the same.
LEFT OUTER JOIN is performance killer in many many cases:
- The "right part" (related to joined table) table may contain NULL values not just because they're actually are in that table, but because they appeared after join there. And if you'll use NULL such values in query somewhere (e.g. is such columns are used in ORDER BY clause, or compared with NULL in WHERE clause), SQL Server will need to execute the whole JOIN first, and only then - filter or sort the result! In fact, in this case indexes on descendants' tables won't be used at all. We implemented INNER JOIN usage optimization for reference properties just because of this, so now DO uses INNER JOIN in almost any case (exception is traversal of nullable reference property).
- Worse: I just discovered SQL Server doesn't try to reorder any OUTER joins at all (see "Limit using Outer JOINs" section)! Looking at EF's JOIN order in SQL, this implies only indexes from hierarchy root table may actually be used by SQL Server - since join comes first there, and indexes become useless after this is done (cost of hash join is ~= cost of linear scan) - at least, as far as I can judge. So SQL Server simply can't produce an optimal plan for your query in this case - it will crunch all the data in nearly straighten-forward fashion instead.
What about NHibernate?
NHibernate seems more clever here (in comparison to EF ;) ): this post by Ayende shows it uses LEFT OUTER JOIN only when you query for non-leaf type; otherwise, it uses INNER JOIN. So:
- queries for leaf types (or, more likely, JOINs related to superclasses) lead to nearly the same SQL as in DO,
- but queries for non-leaf types (or, more likely, JOINs related to subclasses, if any) lead to nearly the same SQL as in EF.
So we have an intermediate case here. If you query for leaf types, NH works nearly as well as DO from the point of generated SQL. But if you query for types having several subclasses (descendants), you're facing the same issues as in case with EF.