News, examples, tips, ideas and plans.
Thoughts around ORM, .NET and SQL databases.

Friday, September 11, 2009

Implementing LINQ in ORM, part 1. No SQL DOM = no LINQ.


Personal blogs

Guys, take a look at:
  • - I just published a large post about myself there ;) I stolen design from this blog for now, but shortly it will be a bit different.
  • - there are two posts about persistent interfaces and finding references to specified objects.
As you suspect, this blog shortly will be a history. I'm thinking what to do with it. Likely, there will be an aggregated feed of posts related to DataObjects.Net. Another option is to simply make it obsolete. Let's see.
What's more important is that these two blogs will be definitely alive. You can subscribe to my blog right now - I already redirected its feed to my FeedBurner account. I'm not sure about Dmitry's blog, so please wait with this. We'll announce when it will be fully ready for subscriptions here.

Wednesday, September 09, 2009

Use SyntaxHighlighter

Just installed Alex Gorbatchev's syntax highlighter to this blog - it works perfectly. Earlier we've been using it on, and it worked perfectly.

So if you have your own blog or wiki where you're going to publish some code, this is what I recommend to install first. See its integration page - it can be added to almost any known engine with ease.

null.MethodCall(...) issue on Memory storage

The problem: this code works differently on Memory and SQL storages.
var query =
from c in Query<Person>.All
where c.FirstName.ToLower()=="alex"
select c;
foreach (var c in query)
The reason is pretty simple: we don't replace method calls on primitive types to their versions performing null checks while compiling the query for Memory storage. So if this code will be executed on IMDB, it has a chance to fail with NullReferenceException in FilterProvider iterator.

Currently you can workaround this issue by e.g. this code:
var query =
from c in Query<Person>.All
where (c.FirstName ?? string.Empty).ToLower()=="alex"
select c;
foreach (var c in query)
But as you might suspect, this can affect on query plans in SQL. So better option is to make DataObjects.Net to handle this. That's exactly what we're going to implement in the near future.

Tuesday, September 08, 2009

Thoughts about EntitySet performance (continuing the previous post)

- We can materialize ~ 2-3K queries per second, when each of them is fetching 50 entities. Our ORMBattle.NET tests show this. Of course, this result was measured on our moderate test PC.
- 90-95% of EntitySet content load time is spent on such a query. Everything else there happen quite fast: mainly, we put Keys of fetched Entities to internal dictionary. Since their hash code is already computed and cached at this moment, this happens quite fast.

- We can load about 2000...3000 EntitySets per second.
- Thus Fabio's test must pass on DO in several seconds on our test PC. It will take a bit longer to materialize entities there in comparison to ORMBattle.NET tests, because they're a bit more complex than Simplest instances we used there. But since we must load just ~ 2500 EntitySets, this must require less than 10 seconds. NHibernate completes it in 24 seconds, but on different PC.

Far going conclusion: likely, we can win NHibernate on this test even without upcoming preloading ;)

We will try to find a time to implement this test for DO4 further. I'm really curious if this is true.

Honest comparison of NH and EF?

I just published a post about Fabio Maula's comparison of NHibernate and Entity Framework. I think it deserves to be re-posted here.

If you don't know the story, Oren Eini (Ayende @ Rahien) and Fabio Maula, likely, are the most well-known criticizers of ORMBattle.NET. As well as the most known NHibernate developers.

Monday, September 07, 2009

Delivery of nightly builds will be suspended for this week

The reason of this is that I'm going to switch to integrated PostSharp usage, and implementation of this feature implies a set of parts:
1) Significant modification of our internal build process
2) Significant modification of DataObjects.Net-based software build process
3) Significant modification of DataObjects.Net installer.

As you see, part 3 is the last one. Although I'm going to commit the changes after getting part 1) implemented ;) That's why there will be no nightly builds.

Why we're doing this? Our "exit polls" (uninstall polls) show requirement to additionally install PostSharp is one of the most annoying ones.

Tuples performance improvements

As you know, DataObjects.Net framework intensively uses Tuples behind the scenes:
- In-memory storage completely relies on them: all the indexes there are of Index type.
- The same is about RSE. Any record it returns is actually a Tuple.
- Finally, any Entity internally uses DifferentialTuple to remember original and updated state. Protected Entity.Tuple is exactly this DifferentialTuple. So e.g. to check if field with name fieldName is already loaded (available), you can use this code inside Entity method: this.Tuple.IsAvailable(this.Type.Fields[fieldName]). This means they're used to store internally cached Entity state during its lifetime.

And as I wrote earlier, our Tuples are different from Tuples that will be available in .NET 4.0. Mainly, because they're fully virtualized. You can provide any implementation of them you like. E.g. we use TransformedTuple ancestors or DifferentialTuple, that implement their own field access logic.

This is quite convenient, but pretty costly as well: initially we've been using virtual generic methods to allow such "rich" Tuple field access logic to be overridden, and finally failed with this: virtual generic method calls are quite costly, because in fact they rely on dictionary lookups. That was quite disappointing... So finally we switched to version with boxing - it was almost 2 times faster on tight loop tests.

But frequent boxing is bad as well: it fills generation 0 faster making CLR to run garbage collections in generation 0 more frequently. Cost of "releasing" unused boxed objects is zero, but each GC implies significant amount of work anyway. You might assume the CPU cost of Gen0 GC in real-life application is nearly constant, so it's a good idea to decrease the amount of such collections. But how? The only way is to allocate less. We know this, and frankly speaking, our entity materialization process was already highly optimized from this point - there are no any temporary per-object allocations everywhere at all... Everywhere, if we could exclude Tuples :) On the other hand, AFAIK currently we have ~ 2 box-unbox operations per each field we materialize just because of this! Taking into account this fact and materialization performance, you can imagine how well our materialization pipeline is optimized.

Now the good news: last Friday I invented a way to get rid of this lack. The idea behind is to "emulate" generic virtual method calls (internally relying on dictionaries) via regular virtual method calls returning delegates based on field index (relying on array access).

We already measured this must raise up Tuples performance at least by 50%, if there are no "external" RAM allocations at all. In reality the numbers will be even better: we expect 80-120% increase in case if "external" RAM allocations happen.

The numbers we've got on tests for real-life case:

Old Tuples with boxing:
- Setter: Operations: 23,323 M/s.
- Getter: Operations: 24,629 M/s.
New Tuples without boxing:
- Setter: Operations: 70,954 M/s.
- Getter: Operations: 81,680 M/s.

Currently new logic isn't implemented yet, although we simulated its particular case on tests. If you're interested in ideas behind and exact implementation, see Xtensive.Core\Xtensive.Core.Tests\DotNetFramework\NewTupleLogicTest.cs from the latest nightly build.

We're going to implement new logic in v4.0.6 release as well. It must be relatively simple, since it won't lead to public API change. Hopefully, this will noticeably increase our materialization performance further.

The issue to track: 382.

Issue tracker: statistics

Just facts:
- We're closing 62 issues per each month starting from May (DataObjects.Net issue tracker became available that time).
- Currently there are 131 open issues out of 381 registered ones.

Sunday, September 06, 2009

Upcoming changes

It's time to describe what we're planning to change and deliver in September.

1. v4.0.6 and v4.1 will be released. Earlier I described what's planned there. But I didn't say all :)

We're going to deliver v4.1. It's obvious we're unable to finish with sync there (likely, sync will be moved to v4.2), but on the other hand, I think we've made more than enough changes to make such increase of version number fully justified. Moreover, with v4.1 release we will:
- Launch of new DO4 web site. Just few pages with major features described alnd links to Wiki & blogs, but it will represent DO4 much better.
- Complete manual & Wiki. Yes, as many of you wished, it will be easy to print it and read it on weekend. We're starting works on huge Wiki update next week. Manual will be built based on Wiki content - either autmoatically or manually.
- Improve installer. Mainly, new project templates will rely on integrated PostSharp .targets files, so it won't be necessary to install it.

2. I hope we will be able to return DO4 back to ORMBattle.NET. Now there is a very strong competitor in performance area - BLToolkit, so I hope it will be acceptable for the community.

Actually, one more reason for this is that I'd like to show it is possible to beat even such a killing simplicity on many of existing tests - even although BLToolkit fits almost ideally into my description of theoretical performance test winner there . And I really like such challenges.

3. This blog will die, but my own blog will replace it.

I clearly see impersonal blogging works must worse than personalized one. So I'll be the "blogging face" of DO4 and Xtensive. Actually this does not really change anything - I'm anyway the author of, likely, 98% of our blog posts. But by some reason I always tried to push forward my company's image and identity rather than my own. And I now feel it was wrong.

Many of our customers deal with us mainly because of people they deal with at Xtensive. Some of them know me, others will initially remember Alexey Belov, Alexander Ustinov or Dmitry Maximov. In fact, as everywhere, they like the people behind the company, but not the company itself. So making our products and blogs much more personalized must be a good idea.

I hope you won't think of this as of my own wish to advertise my personality further. I never did this during past 6 years. In fact, earlier many of persons standing behind our products were intentionally hidden. Now I'm going to bring some of them upfront. Products we develop must be associated with some persons standing behind - this will make our business more personalized, and, likely, more trustful for you.

P.S. Frankly speaking, the main reason of this is that I tired to write "we think" instead of "I think". But that's not for public ;)

P.P.S. Do not unsubscribe ;) I'll try to move the blog to new location keeping old RSS and Atom feeds working.

Saturday, September 05, 2009

Materializing entities with unknown type: the DO4 way - the addenum

Recently I wrote Materializing entities with unknown type: the DO4 way. And today I remembered about one more important case:

6. If type discriminator (TypeId) is included into Entity.Key, there will be no type-related lookups. On the other hand, currently we anyway try to fetch the entity to ensure if it exists or not.

So inclusion of TypeId into Key affects on fetch behavior (Key resolution):
- If it isn't included, and no TypeId is cached for this key, fetch request will "touch" only hierarchy root table.
- Otherwise it will touch the precise set of inheritance tables.

But how this is related to materialization behavior? Of course, if this type of Key mapping is used, we can provide an instance of an object any Key refers to without any lookups, and load its field values further. But currently we don't do this. This isn't absolutely necessary, since we anyway provide a way to read Key corresponding to any reference property without materializing the Entity it belongs to via Persistent.GetReferenceKey method; on the other hand, in such cases we can allow this behavior - simply because it worth nothing in this case, and this is safe?

So what do you think, must we add this feature? E.g. it may work if [Association(LazyFetch = true)] is applied to reference property and reference key includes TypeId. Note that the only benefit of this is that you can access Key, TypeId (thus type as well) and IsRemoved properties of such entity without hitting the database. On the other hand, Persistent.GetReferenceKey provides the same information in this case (IsRemoved = key!=null; TypeId = key.TypeId).

So... Must we implement this?

P.S. Earlier I published information on DataObjects.Net code coverage. Quote from the article: "Tests for Xtensive.Storage project are performed in 3 (provider: Memory, MS SQL, PostgreSQL) * 3 (different inheritance schemas) * 2 (TypeId is included / not included to any reference) = 18 configurations at all". So:
- The case when TypeId is a part of reference is fully tested
- It's actually quite easy to write an IModule adding TypeId to all the Keys. See TypeIdModifier class source is Xtensive.Storage.Tests project for ideas on how to implement this.

Friday, September 04, 2009

Thursday, September 03, 2009

Materializing entities with unknown type: the DO4 way

As you know, I'm reading Oren Eini's blog. His today's post is about tricky case related to entity materialization (my comment starts here). I recommend you to read the whole post before continuing :)

Ok, let's think you already know the problem. Now I'll explain how DO handles this:

1. DO always returns correct type of any Entity. I.e. you will never get an Animal instead of Dog. So if type of some Entity isn't known (below I describe what this mean), but we must return its instance, we'll go to the database for it. Such a query will fetch all non-lazy fields of this Entity in addition to its type.

2. Our LINQ query translator is designed to pull the informational about type of any Entity or Key you're going to materialize through the whole query. Currently this information is stored in TypeId field, but later we'll allow you to use custom type discriminators.

3. We maintain LRU cache of Key objects. Each key caches the type of Entity it belongs to. Default size of this cache is 16K Keys, and it contains only those Keys which types are precisely known (but we don't cache Keys of hierarchies consisting of just a single type). So in fact, we cache ~ 16K types of entities we used most recently. This means we can materialize the entity without its type lookup on subsequent access attempts - even in different Sessions.

4. Key comparison (Key.Equals, etc.) is optimized for presence of such cache. Different Key objects may correspond to the same key, but since some Keys can be cached, there can be a single instance of such keys shared across multiple Sessions. So first we compare keys by references, then we compare their cached hash codes and only after this we compare the content. In fact, Key comparison happens quite fast.

5. As you might see, we imply type of any Entity can't be changed during application lifetime. The same is correct for .NET objects, as well as for objects in many other languages. Although this is not always correct for objects stored in databases :) The only way to properly handle entity type change in our case is to clear domain key cache. I think this is acceptable, because this is either what really never happens, or happens quite rarely.

That's it ;)

Wednesday, September 02, 2009

Tuesday, September 01, 2009

New leader @ ORMBattle.NET!

So this finally has happened ;) Check out this post in ORMBattle.NET Forums.

Igor, my congratulations ;)

I'll re-publish this in ORMBattle.NET blog in ~ 1-2 days, and update the scorecard ~ in 1 week (on its next update). Delay is related to necessity to implement at least one tweak added by Igor to all the other tests (it must make all the results a bit better on small sequences), as well as the lack of time. I already spend a lot of it on preparing the current scorecard, but by some reason has finally lauched BLToolkit tests, that were committed just today in the morning :)

Can query fail in production?

Oren Eini has commented Davy Brion’s post "There Is No Excuse For Failing Queries In Production", and I found his comment quite important. On contrary, you must remember any query may fail in production.

Imagine: even Davy, who wrote an excellent sequence of posts describing how to implement your own DAL, can forget about this. So the topic is definitely important enough to be re-posted here ;)