DataObjects.Net Team Blog: August 2010

Thursday, August 26, 2010

A good post about TPT ("table per type") inheritance mapping and its performance-related concerns in EF

The article is interesting itself, but I'd like to highlight some differences related to support of TPT ("table per type" or "table per class") inheritance mapping in DataObjects.Net:

1. EF uses "greedy" queries joining all the tables in hierarchy. DO joins minimal set of tables necessary to run the query.

EF is not initially designed for wide usage of lazy loading, so it should fetch the whole entity state, and thus - join all the tables in hierarchy.

DO behaves differently: when you query for Dog type in Animal hierarchy, it will join only Animal, Mammal and Dog tables. That's exactly what's necessary to fetch Dog type properties. If actual instances are descendants of Dog declaring their own fields, we imply you'll use .Prefetch to additionally fetch their own fields, if this is necessary; otherwise, we'll anyway load them on demand, but this will lead to SELECT N+1 issue.

.Prefetch API, in fact, fetches entity states individually - by splitting entities to process into groups of entities having the same type, and fetching their state by batched queries having Id IN (@Id1, @Id2, ...) in their WHERE clause (or its equivalent with (... OR ... OR ...) for composite keys). We've chosen this approach, because it allows us to:

Ignore the entities with already cached state - i.e. if they were passed to .Prefetch, nothing will happen.
Fetch the state from global cache first, when it will appear. So .Prefetch won't load the database server at all, if state of all the entities is cached in global cache.
Issue "ideal" SQL queries with quite predictable cost: any SQL query sent by .Prefetch is, in fact, translated to a set of primary index seek operations.
.Prefetch queries are batched. Technically we can fetch pretty large number of entities by each of such batch (up to ~ 1000K entities per batch because of SQL Server query parameter limit), but actually we fetch 256 entities in each batch. Such limit is chosen because it's optimal: that's anyway a bit more than we can materialize during execution of such batch, on the other hand, queries we send are shorter, and thus simpler for optimizer.

Why having less joins is important? JOIN order optimization is computationally hard: currently most of RDBMS use a heuristic algorithm allowing to produce nearly-optimal join sequence using n! steps instead of 4^n (where n is count of JOINs). But even n! is implies quite fast growth of complexity with n:

5! = 120
10! = 3 628 800
15! = 1 307 674 368 000
20! = 2.43290201 × 10^18

Hopefully, it's clear now that query with 15 JOINs is nearly impossible to optimize well, even with use of described heuristics: 1300 billions of join sequences must be considered to handle this. Actually, I don't know what SQL Server will do in this case, but the best it can do is to use the best plan it could produce in some limited time. And this plan won't be optimal with 99.99...% probability, since 99.99...% of other possible plans were simply rejected.

That's why I constantly write EF simply can't be used in case you have large inheritance hierarchy: 16 types = 15 JOINs in any query to this hierarchy.

2. EF uses LEFT OUTER JOINs to join the tables in hierarchy. DO always uses INNER JOIN to do the same.

LEFT OUTER JOIN is performance killer in many many cases:

The "right part" (related to joined table) table may contain NULL values not just because they're actually are in that table, but because they appeared after join there. And if you'll use NULL such values in query somewhere (e.g. is such columns are used in ORDER BY clause, or compared with NULL in WHERE clause), SQL Server will need to execute the whole JOIN first, and only then - filter or sort the result! In fact, in this case indexes on descendants' tables won't be used at all. We implemented INNER JOIN usage optimization for reference properties just because of this, so now DO uses INNER JOIN in almost any case (exception is traversal of nullable reference property).
Worse: I just discovered SQL Server doesn't try to reorder any OUTER joins at all (see "Limit using Outer JOINs" section)! Looking at EF's JOIN order in SQL, this implies only indexes from hierarchy root table may actually be used by SQL Server - since join comes first there, and indexes become useless after this is done (cost of hash join is ~= cost of linear scan) - at least, as far as I can judge. So SQL Server simply can't produce an optimal plan for your query in this case - it will crunch all the data in nearly straighten-forward fashion instead.

The good thing is that part 1 (above) isn't applicable now - at least, in case with SQL Server. But this doesn't mean everything is better because of this - actually, everything is even worse.

What about NHibernate?

NHibernate seems more clever here (in comparison to EF ;) ): this post by Ayende shows it uses LEFT OUTER JOIN only when you query for non-leaf type; otherwise, it uses INNER JOIN. So:

queries for leaf types (or, more likely, JOINs related to superclasses) lead to nearly the same SQL as in DO,
but queries for non-leaf types (or, more likely, JOINs related to subclasses, if any) lead to nearly the same SQL as in EF.

So we have an intermediate case here. If you query for leaf types, NH works nearly as well as DO from the point of generated SQL. But if you query for types having several subclasses (descendants), you're facing the same issues as in case with EF.

Tuesday, August 24, 2010

We have moved to the new technical support service

Hello,

Today we're announcing that Xtensive LLC has finally moved to the new technical support service.

Previous version of support forum operates in read-only mode, the red notification header is placed there:

See you on support.x-tensive.com!

Dmitri Maximov

Thursday, August 19, 2010

Temporary problems on support.x-tensive.com

The website might be inaccessible due to DNS-related problems. The issue is being resolved by hosting service specialists. We are sorry for the inconvenience.

Tuesday, August 17, 2010

News on support.x-tensive.com

Hello,

Today we've switched to a new development snapshot of the support.x-tensive.com website. Among all changes we made the most noticeable for users is certainly integration with syntax highlighter (by Alex Gorbachev), which is automatically being applied to all code fragments. As for us, the most important update is the ability of incremental migration from old forum to the new one. During this process the following tasks are being executed:

Find and import topics & answers that are created in old forum starting from the date of last migration.
Find users that registered on the new support website starting from the date of last migration, find their already imported topics and posts, associate them with their content and update their karma.

There is a task that is not yet accomplished: email notification service configuration and testing, but I suppose that it will not take too much time.

Having all this stuff done we feel that the new support service is ready for real life usage and, therefore, we are making an announcement that old support forum will be switched into read-only mode in few days.
The appropriate orange header with notification text is put on its index page:

Thanks
Dmitri Maximov

Thursday, August 12, 2010

Gain your karma on support.x-tensive.com

A few days ago I posted about our plans on support.x-tensive.com service and asked you, our dear customers, users and followers, to register on it in order to make data migration process and author association simpler and more transparent.

We thank those of you who found several minutes and accomplished this trivial registration task. This gave us an opportunity to achieve the following goals:

Associate authors of topics & replies in old forum with posts in new support website.
Calculate and set level of karma to authors according to their activity in old forum.

Karma is calculated in accordance with the following rules:

Comment = 1 point
Topic = 3 points
Answer = 5 points

As a result, we've got this picture:

It is not a surprise that the blue rabbit has got maximum level of karma, as it is working night and day 24x7 on the support forum. Our congratulations to leaders: aramahdessian, Paul Sinnema, xumix and all other contributors.

I suppose, we'll wait a week or two in case anyone else decides to join and earn karma for his participation in old forum, and after that we'll do final migration to the new version of support service.

Thanks.

Dmitri Maximov

Monday, August 09, 2010

On support.x-tensive.com

We are resuming the work on our new technical support service, which was postponed due to the final stage of DataObjects.Net 4.3 development.

As I've already posted, we made rather good progress on it, for example, the following things are ready and tested:
- Authentication integration with www.x-tensive.com using OpenID protocol.
- Migration of topics and answers from currently used forum.

Unfortunately, user import task seems to be unexpectedly overcomplicated because of principally different authentication schemes used in forum and support service, so current version of it contains posts which are signed by artificially created 'Editor' user instead of original authors:

where red areas highlight artificial 'Editor' user, whereas blue ones goes for originators, whose names are primitively included into text of topics & comments.

In the next release of our support service we would like to sort out the problem of topic and comment ownership. In order to achieve the goal, we're asking you to spent approximately 2 minutes of your time and sign on to support.x-tensive.com. Having list of registered users we could easily map them to correspondent users in old forum. After that is done, topics and comments will be owned by original authors, who could gain karma, vote for answers, mark them as accepted, etc.

Thank you in advance.

Dmitri Maximov

Tuesday, August 03, 2010

How to activate DataObjects.Net license

Our license activation process isn't absolutely intuitive so far, so unfortunately, I should write this shameful post :)

Imagine you already ordered a license or prolonged it. You must:

Login to X-tensive.com with your own account.
Activate your licenses by clicking on links in e-mails we've sent to you ("Thank you for purchasing our product. To register it, follow the instructions below: ...").
Go to Profile\My Subscriptions page at X-tensive.com
Download your company license there
Run License Manager (Start Menu\DataObjects.Net\Tools)
Install company license by copy-pasting its content.
Activate Hardware License for it it via Internet - by clicking on corresponding button there.
If this is impossible (e.g. there is no Internet connection), you can activate Hardware License manually. You must visit Profile\My Subscriptions page again, click on "Manage hardware licenses" and register your HWID there. Your hardware license bound to this HWID will be generated, so you should copy-paste it again. Your HWID is displayed by License Manager.

DataObjects.Net Team Blog