News, examples, tips, ideas and plans.
Thoughts around ORM, .NET and SQL databases.

Monday, June 29, 2009

DataObjects.Net extensions

Recently we have discussed DataObjects.Net extensions concept and there are some conclusions:

DataObjects.Net extensions are fully reusable DataObjects.Net modules designed to assist developers with common development concerns, such as validation or security. They can also include their own persistent model that becomes a part of the whole business model - for example, security extension can contain such entities as User and Permission.

We plan to provide a simple, but powerful API for this feature: extensions are based on session-level events and are descendants of SessionBound implementing a single interface:

interface IExtension
{
  void Initialize(Session session);
}

Normally extensions will subscribe on Session events during the initialization and register some periodically executing actions. This pattern is very similar to HttpModules in ASP.NET. So Session class will be provided with a set of events, allowing to customize life cycle of session and persistent objects.

What will be based on extensions? Already existing validation feature: it will be refactored to such an extension after getting this subpart implemented. Other candidates are security and full-text search extensions.

The issues related to extension system: 258, 256.

Update: we've decided to implement this a bit differently. Please read description of issue 258 for details.

Tuesday, June 23, 2009

Starring the issues

Do you know you can star issues in the issue tracker? Starring the issue means:
- You'll be notified on any updates we've made to it (by e-mail). New comments, changes in tags, its status and so on.
- Issue tracker displays the count of stars for each issue. Many stars = many people are interested. We'll take this into account while organizing issues by releases.

Finally, you can leave us a comment or idea there. It is really simple: click "Add comment" or press "r" key on issue details page.

v4.0.2 status

v4.0.2 is delayed for a day or two, that means it will be available tomorrow, or the day after. Its issue list became a bit bigger during the development. You may find we're going to implement a set of tasks we initially planned for later implementation.

Friday, June 19, 2009

DO4 and Silverlight: will it be easy to make them friends?

Today I've been analyzing this. Briefly, yes - it won't be too complex. Hopefully, 2-3 man-months. Now the details:

1) Dependencies on assemblies which aren't compatible with Silverlight
- Obviously we'll get rid of all the providers except memory provider (in future - file system provider as well).
- It's easy to get rid of Parallel FX as well - actually we don't use it intensively for now.
- The only left one is log4net. Again, quite easy: we use our own logging abstraction layer I'm promising to describe for very long time. In our case logging is fully replaceable.

Btw, there is PostSharp as well, but its 1.5 and upcoming 2.0 versions are ready for Silverlight. So we must migrate to one of them. Most likely, we'll wait for 2.0.

2) Overwhelming Reflection & Reflection.Emit limitations in Silverlight
As it looks like, it won't be too complex. Silverlight limits the reflection only to members your code can access directly, so no any privates or internals. We have lots of internal classes we're reflecting by AssociateProvider (it provides comparers, hashers, size calculators, arithmetics and so on), but actually we've made them internal just because they aren't must be necessarily public. They implement the interfaces AssociateProvider looks for, that's it.

So in majority of cases it will be enough to open such types for public access - they're anyway frequently "hidden" into Internals namespaces.

We also emit lots of code - e.g. whole Tuples infrastructure is build on Reflection.Emit. But AFAIK we never used such things as accessing internal or private members of types we don't emit. Must be checked, of course, but at the first glance it seems relatively easy to fix.

I remember just one place: we make PostSharp to build protected constructors for Entity & Structure ancestors - they look like this one. They're used on materialization. But we can make PostSharp to emit some public static member invoking them as well (specially for Silverlight).

3) Absence of serialization in Silverlight
This is the most complex part. On the other hand, we anyway need fast binaty serialization and currently looking for good approaches there. DO4 already uses serialization few several places, including deserialization of aspects and Metadata.Extension instance storing simplified version of current Storage model (I wrote we use it to make schema upgrade layer to be aware of type-level structure, instead of just table-level structure) and Storage itself (Entities already support serialization by IFormatter). Furthermore, upcoming sync and file system provider also require it.

Our final set of decisions related to serialization is described in this issue. Having it implemented, we'll get fully portable serialization layer.

It seems that's all. If you know any other pitfalls, please notify us. If you're interested in getting Silverlight support, please vote for it in upcoming survey.

Thursday, June 18, 2009

Icons in online help

What do you think about such icons in TOC of our online help? Neither .HxS, nor .chm contain any information about TOC icons, but we've found a good way to add them - and this is fully customizable.

This new feature will be available in the nearest Help Server update.

Plans for this week

We're working on v4.0.2 - you'll see on the nearest Monday. It will resolve the most annoying issues we've found during these 8 days after v4.0 release, as well as implement few usability improvements. The worst ones are related to our installer - I hope that now we know all of its bugs. It's also possible we'll implement few of LINQ-related enhancements from v4.1, but this isn't where I'm 100% sure.

Finally, there is a huge work on Wiki (manual) as well.

What's next? Most likely we'll separate a branch for mentioned LINQ updates and bugfixes to release it as v4.0.3, and start working on sync & caches - there will be ~ 3.5 weeks to implement them, and I hope this will be enough.

Personally I'll be spending most of my time here - telling you about features and internals of DO4, as well as work on docs and marketing related stuff. Finally, there is a time for this ;)

P.S. We're going to publish a survey about "post-sync" features shortly - please pay attention and fill it, we'll decide what to do next by its results.

Wednesday, June 17, 2009

June and summer discounts: join DO4 camp!

We're ready to start DataObjects.Net promotional campaign, and as frequently happens, it starts from pricing. Read this post till the end - we're providing huge discounts in June, but expect your involvement in exchange.

Price & licensing policy changes

This summer prices:
- Personal license: 299 USD (no changes)
- Internal license: 495 USD (-100 USD)
- SMB license: 995 USD (-1000 USD)
- Enterprise license: please request a qoute.
- Upgrade subscription: 50% of the license cost. To order it, you must order the original license with DOUPGRADE discount coupon code. Note that it's applicable only if your exising subscription isn't expired - it expires in 1 year form the first purchase; each purchased upgrade subscription adds 1 more year.
- Support subscription: now included into the cost of license.
- Implementation assistance and general consulting: 320 USD/day.

Discounts in June

First of all, we provide 50% discount on almost anything related to DO4, including:
- Licenses: they cost 150/248/498 USD in June. Quote for Entrprise license is provided by request.
- Upgarde subscriptions: 90/149/299 USD in June (so here the discount is 40%)
- Implementation assistance and consulting: 160 USD/day! This price will be definitely left intact till the end of summer.

Coupon code: JDDO4 - right now it is already included into all order pages.

Huge discounts for the people helping us

The conditions listed in this section will remain intact at least till the end of summer.

1. Promotion help discount: additional 50%.
Applicable to new license purchases only. You promise us to statisfy one of the following requirements during 1 month after your purchase:

1) You start or join, and further - maintain 3 different discussions on such developer's web site or forum as stackoverflow.com (it must be for .NET developers, and be well-known either worldwide or in your country).

The duscussions must be related to DO4 - its particular or upcoming features. We don't expect it must be fully positive - it must be simply honest. And there should be links to our wiki, blog or DO4 section on our web site.

2) If you maintain a well-known blog (well-known - of course, relatively - e.g. in your city; ideally, there must be >100 subscribers), you can publish 2 posts related to DO4 in it. Again, with your honest opinion and with links to us.

Please send the links to such discussions or posts to us - at least, as a proof you accomplished your part of contract ;) We'll join the discussion, if it will be necessary.

If you already did something similar, please notify us, and if we'll confirm the conditions listed here are satisfied, feel free to use the coupon code for this dicount.

As you see, we provide additional 50% discount (= 75% now) in exchange for several hours of your help. The final prices with this discount are:
- Licenses: 75/124/244 USD in June, or 150/248/498 USD later. Coupon code:JDDO4POSTER. Note that you can use it right now, but this implies your agree with the above conditions.

2. Contributor licenses: free SMB \ Enterprise licenses with lifetime upgrade and support
We're ready to provide free licenses, if you'd like to help us to develop or promote the product. What can be done to get a free license:

a) Significantly help us to develop some part of DO - for example:
- SQL DOM / storage provider for the database we don't support yet, or some notable part of it
- Sample application (currently we'd prefer good samples for ASP.NET or ADO.NET data services / Silverlight) or some notable part of it
- Useful tool (anything you miss ;) ) or some notable part of it
- Some other part we're planning to implement.

Expected complexity: if it took more than 10 full-time days, and we accepted the result, we'll definitely provide such a license. But other cases are discussable - we're ready for any proposals like this.

Obviously, it's necessary to discuss your plan with us before starting the implementation.

b) Promote DO4 at your local .NET, ALT.NET or similar user group - by making 1.5 hour overview of it there. You can choose any particular part you want, but ideally it should be based on our videos or on our own presentation of DO (will appear in June).

c) Do anything you want comparable to a) or b) by its effect. In fact, we expect you'll spend ~ 1-2 weeks on this, and the result of this will be attractive for DO users.

Remarks

As you may find, our goal for this summer is to grow up the community around DO4. Prices and earnings don't really matter for us.

But you may ask:

Why are you asking to pay something at all? The reason is simple: paid money is one of factors making things precious for you. We expect they'll make our simple "promo agreements" more important for both sides.

Why now? Because we feel the product is fully ready to be shown to much wider audience. Right now it can do more than e.g. currently available version of Entity Framework. But I hope by the end of summer it will be capable to simply smash it by the set of its built-in features.

Ok, of course I fully understand it won't be easy to compete with EF. But imagine, if EF would be shipped by someone else rather than Microsoft. I admit it would be really hard to get a big niche on ORM scene with its design. Really, they're providing features almost identical to NHibernate, but packed into "Microsoft way" box. It's really hard to identify what's new there. I'd also say this is dangerous for NHibernate as well.

I feel we're combining something very good both from open source and commercial development by running this campaign. I hope it will help to involve the people. Our purely commercial promo campaign will follow shortly - we'll be spending almost 5 times more on advertisements starting from the next week (I'm waiting for completion of a set of vital changes on our web site & Wiki). But your help is more important - just rumors, posts and articles make such products really famous.

So I suggest you to join our community right now. Together we'll go forward much faster. Ask yourselves:
- What commercial ORM is as open as DO4? We'd say even non-commercial are less open. Open source, availability under GPL, public issue tracker, open plans, Wiki that can be edited by you, possibility to contribute.
- Do you expect EF will ever run on open source platform? DO4 will definitely run on Mono in observable future.
- Is there any chance of getting DO4 development stopped during the nearest years, if we were ready to spend 2.5 years on developing this new version?
- Does DO4 differ from competitors? Even the current feature set shows it is. Architecturally it's completely new. Such schema upgrade layer, support for index storages, built-in IMDB, integrated query engine, upcoming sync - who else have these features?
- What features are expected in near future? There are tons of them on the way. You may find the product you see now, although being solid, shows a huge foundation for upcoming features. And that's cool, because the baby is just 2 week old! Ok, I know, but the pregnancy was long ;)
- Are we devoted to DO4? You must feel it worth much more than just money for us. It is a part of our image and reputation.
- Can you rely on it and on us? I hope, yes - taking into account above, as well as the experience (including support incidents) we've got during previous years of work on 1.X-3.X.

Join DO4 camp!

P.S. Even if you already have a license, think about writing something good about us ;)

Thoughts: Silverlight security is nothing more than nothing?

The guys from Silverlight team have applied so many restrictions to reflection in Silverlight to make it "more secure". Frankly speaking, I hate this - first of all, because they're quite unreasonable in many cases. I like to build various "auto" stuff like our AssociateProvider based on it (obviously, everything is cached and is blazingly fast on subsequent calls), and such limitations seriously annoy me. "Security!" - they'd say.

Recently I thought about security in Silverlight/.NET, and a signle fundamental weakness has immediately came into my mind:
- They support multithreading, but I suspect at least some of secure classes they ship (i.e. the ones making, and, more dangerously, internally caching results of some security checks) aren't designed to be used in multithreaded environment. This means thier behavior is unpredictable while they're used concurrently.
- Multithreading normally implies much more complex scenarios related to read\write ordering, second-level CPU caches sync and so on. Taking these ones into account, everything becomes even more complex. E.g. there is a chance that object that doesn't support multithreaded access (but being intentionally modified concurrently) will "see" a completely impossible state of it (or of some object graph) just because CPU cache in the current thread isn't in sync with the modifier's thread right now. Obviously, the chances of getting this are rather small, even if I intentionally write everything to make it high enough. But they increase with time such an exploit runs.

Actually, getting both security and true multithreading in such frameworks are always complex. Obviously, .NET must have the same problems. And, frankly speaking, I don't know if there is any general solution. It's nearly like to expect full serializability while working on read committed isolation level. People used to think this way. I suspect we should seriously change the way we design software (i.e. use completely different programming paradigms and languages) to get them solved.

Few links about memory models:
- "Explring memory models" by Joe Duffy (look at #4 there).

Monday, June 15, 2009

Friday, June 12, 2009

Upcoming updates: v4.0.1, v4.0.2, current test results

Now I can share what's planned for this and the next weekend:
- v4.0.1. Minor update, mainly bugfixes. Will be available on this weekend.
- v4.0.2. Minor update, bugfixes and documentation improvements. Will be available ~ on the next weekend.

So we've been fixing rather tricky LINQ and schema upgrade bugs on this week, and will continue this on the next one.

And now the "main dish": current test results for Storage:

Test project: MS SQL 2005 Default
- Tests failed: 11, passed: 921, ignored: 54

Test project: PgSql 8.3 Default
- Tests failed: 20, passed: 911, ignored: 54

Test project: Memory Default
- Tests failed: 31, passed: 899, ignored: 57

Only tests for Storage solution (Xtensive.Storage.*) are listed here, there are 987 tests. Ignored tests = features that are planned, but aren't implemented yet. E.g. there are LINQ tests where arrays and collections are passed as query parameters.

Besides Storage tests, there are:
- 367 SQL DOM (Xtensive.Sql.*) tests. About 7 of them fail, but that's ok - most of them test "ALTER DOMAIN" construct, which is unused by DO, the only one left is test for substraction of interval from date on MS SQL (may affect on LINQ query result, if you substract TimeSpan parameter from DateTime field in query). 25 tests are ignored (by the same reason).
- About 450 tests for other projects (Core, Indexing, Integrity, PluginManager, TransactionLog) - no failures.

So there are 1725 running tests (i.e. non-ignored), and 18 ... 38 (1...2%) of them fails (dependently on underlying provider).

What do you think - are we shipping pretty stable builds?

Actually I'd really like to know how testing is performed by vendors of other ORM tools. If you have any info (or links), please add it in comments.

Wiki update

We're planning to restructure & update wiki.dataobjects.net on the next week. To find necessary topics, please use search, since many of them will be temporarily orphaned.

Thursday, June 11, 2009

The plausible promise

I'd like to share a nice article from Chromium blog - it is about development and releasing an open source product. It is quite related to DO4. I frequently get questions like "so is it ready?", "why feature X is not yet implemented here?", "what if I need it?". This article gives a nice answer: anyway, you can use it right now, all the basic features work there, and the missing ones will be delivered as it's planned.

I agree, currently we missing sync & offlines among essential features. But they're on the road - and much closer than you might assume (but even I understand such a position - taking into account our previous 2.5 years of work). Assuming you start using DO4 now, it's quite likely they'll be there before you really need them.

"The community we build today is what will make it a better product down the road, and without that community the product will ultimately suffer." - that's true. Please keep this in mind. We've been developing DO4 inspired by the initial success of its predecessor. We've taken into account all the mistakes we've made in design of v1.X- 3.X. We wanted to make it really the best one. Incomparable. Incredible. And now we see this is quite close to true, even taking into account there are new competitors - I hope in few more months no one of them will be honestly ready to stand nearby.

Btw, yesterday we've been discussing "second system effect" - of course, in context of DO4. I really feel it suffers from it (nice: few posts before I called it "monstrosity" by my own ;) ) - ok, slightly ;), and mainly - internally. It is really complex, and probably, even over-engineered inside. But it's obvious we can make it simpler. We already did a lot to make its front-end enough simple for developers. The simplicity is the result of complexity, not vice versa.

So take it and start using it. Ask us for help. Bother us by questions. Say us why you dislike it. Don't pay for it until you'll be fully sure it worths this. Btw, we're preparing a set of promo proposals related to licenses for early adopters - so if you planned to buy it, don't do this right now ;) Finally, write about it. Right now it's the perfect time to start re-making the community around the product.

And... Thanks for all of you, who've been waiting for v4.0 during these years. It's really pleasent to see some of our old clients are still here after such a long pause ;)

P.S. To proof it's really stable enough, next time I'll talk about how DO4 is tested.

Wednesday, June 10, 2009

Disconnected (offline) entities, POCO and sync in DataObjects.Net 4

First of all, when we need disconnected objects? Generally, we need them in cases when some data must be accessed and modified without opening a transaction. It's something like getting a cached version of web page when you turn your browser into offline mode - you can read, but only what's cached. Moreover, in our case we want to be able to modify such disconnected objects, and persist the changes made to them back to the storage when we're getting back online (connected).

Now let's look on some particular use cases:
1) WPF client. To show the data in UI fast, it must store its cached version in RAM. Normally it must flush the changes back to the database only when Apply button is clicked.
2) A wizard in ASP.NET application that must make a set of changes through a set of postbacks, but they must be really flushed into the storage only on the final wizard page.
3) Synchronizing client. It maintains its own version of database and periodically syncs with the server (not with the database, but with middle-tier server). Example: AdWords Editor. But probably the most well-known example is generally any IMAP e-mail client.
4) Slave (branch) server. It maintains its own version of database and periodically syncs it with the master server. Such architecture is used by applications working in corporations with many distant branches. Branch servers provide their own clients with the data and periodically sync with the master to decrease the load on it.
5) Peer-to-peer sync. Skype is probably the most well-known example of such an application. It syncs chats between its different instances, including different instances using the same Skype account.
6) Public service. We're going to publish our entities for public access to allow other developers to create their own programs using our service. In above cases it was implied we develop all parts of the distributed application choosing any technology we want. So we could use DO at any interacting side. Here we can't - public API must be based on public standards. So we need to support public standards here, e.g. provide RESTful API, publish POCO objects via WCF service, etc.

I hope that's all. Any additions? Feel free to add them in comments.

Can these scenarios be implemented in DO? Yes and no.
Good news: new DO will support all of above scenarios. Most of them will be supported after release of v4.1, and some (case 5: P2P sync) - after 4.2.
Bad news: consequently, v4.0 supports none of above, except case 6 - actually, just because here all depends on you. In you're interested in details, the complete description of this case is in the end of this article.

So further I'll describe what we're going to provide to implement above cases, and what is already done. First of all, let's classify above cases by 5 properties:
1) Size of the disconnected storage. Will it fit in RAM?
2) Queryability of disconnected storage. Will we query it, or just traverse it (get related object, get an object by key)?
3) Concurrency on disconnected storage. Will we access it concurrently?
4) Sync type. Do we need master-slave or P2P sync?
4.1) Sync level for master-slave sync, If we've chosen master-slave sync, do we need action-level sync or state-level sync?

The classification for above 6 cases is:
1) Fits in RAM, not queryable (although in some cases it could be), non-concurrent, master-slave sync, any sync level is possible.
2) Fits in RAM, not queryable (although in some cases it could be), non-concurrent, master-slave sync, any sync level is possible.
3) Doesn't fit in RAM, queryable, likely - concurrent (different instances of application will share the same database), master-slave sync, any sync level is possible.
4) Doesn't fit in RAM, queryable, concurrent, master-slave sync, any sync level is possible.
5) Doesn't fit in RAM, queryable, likely - concurrent, P2P sync.
6) Any case is possible. Let's forget about this group for now.

As you see, cases 1-5 form 3 groups with the same properties:
1) Fits in RAM, not queryable (although in some cases it could be), non-concurrent, master-slave sync, any sync level is possible.
2) Doesn't fit in RAM, queryable, likely concurrent, master-slave sync, any sync level is possible.
3) Doesn't fit in RAM, queryable, likely concurrent, P2P sync.

Let's group them once more, taking into account the following facts:
- Doesn't fits in RAM = needs local database (to store the state)
- Queryable = needs local database (to run local queries)
- Concurrent = needs local database (to handle isolation and transactions)
- Otherwise = needs local state container fitting in RAM and allowing to resolve entities by their keys fast.

So in the end we have just 2 groups:
1) Master-slave sync: local state container or database, any sync level.
2) P2P sync: local database.

Requirements
1. We must fully support initial 1-5 cases.

1. We want to work with the same persistent types at any side (ASP.NET application, WPF client, branch server, master server, etc.) as in case without any sync, although sync may affect on behavior of persistent types (e.g. fetches and lazy loads can be propagated to master).

1.1. It must be easy to detect current sync behavior for persistent types ("sync awareness").

2. We must be able control automatic fetches from master on the slave, as well as and query propagations. We want to:
- Define regions (with using (...)) with desirable fetch and query modes.
- Specify that query must be executed on master or locally right in query. E.g. with use of .ExecutionSide(...) extension method.

3. Update propagation must be explicit.

4. Sync must be a pluggable component built over core Storage facilities. It must utilize just open API. Certainly, this implies we must make it enough open for this.

Decisions

Let's start from "Master-slave sync, local database, any sync level" option. To develop it, we need the following components:

1. Embedded database provider(s).
1.1. In-memory database provider is helpful when disconnected storage fits in RAM, but must be queryable. Actually this is true in many cases - e.g. back reference search on removal needs the storage to be either queryable, or enough tiny to run such queries using LINQ to Enumerable.

As you see, we already have this part. And it will be completely perfect when our memory provider will be transactional. It's really pretty easy with our indexing architecture, but this is postponed at least till 4.2.

2. SyncState tracker (SessionHandler ancestor). In fact, it will be chained to regular SessionHadler to "intercept" fetches, updates and queries. It is responsible for fetching any info from the remote storage in case when local database doesn't contain it, as well as caching and tracking such fetched info when it arrives from remote database. Internally it will use SyncState(of T) entities - do you know we already support automatic registration of generic instances? It really useful in cases you need an associate to generally any type (so sync & full-text search are perfect candidates on usage of this feature).

As you might assume, SyncState allows to track:
- Presence of SyncState indicates the object is fetched from or checked at the master
- IsNull - existence flag. Eliminates duplicate fetches for removed objects.
- Its original version & state (Tuple)
- IsModified flag.

So it allows to derive what properties are changed with ease. That's how we'll do state-level sync. Generally, we must produce a sequence of (original version, state change (Tuple again) and IsRemoved flag) - one for each object with IsModified==true, and send it to master to apply all the changes (we'll use PersistentAccessor for this - probably we'll even add a method allowing to make the whole state change at once - by passing a new Tuple).

3. Action-level tracker. Implementation of Atomicity OperationLogBase writing the actions to the stream, that will be further "attached" to StoredTransaction object (again, persistent). Since we've already implemented serialization & Atomicity itself, this must be rather easy.

When such a sequence is fetched back from the StoredTransaction, it's possible to roll it back, send it to a remote part and apply it there or re-apply it locally. Note that such an operation log contains the information needed to validate the possibility of applying it, or rolling it back. So that's how

4. Clinet-side sync API. ~ ApplyChanges \ CancelChanges methods in Session and SessionHander, as well as their implementations for two above sync cases.

5. Server-side counterpart for client-side sync handlers - a WCF service handling all requests related to sync. Most likely there will be a single one, taking SessionConfiguration as the only its option. Our new Domain is ready for Session pooling, so this solution should work perfectly even under high concurrency.

Now let's look on "Master-slave sync, local state container, any sync level" option. The primary difference here is that we don't have local database - i.e. it's still the same database, but we must be able to:
- Temporarily protect it from any updates. We must cache them in state container.
- State container must also provide repeatable reads: anything we've fetches must be stored in it, and further be available without database access attempts.

Let's look how this could work:

var stateContainer = new OfflineStateContainer();
using (var offlineScope = session.TakeOffline(stateContainer)) {
// here we can run many transactions, updates won't be propagated
// to session
until the end of this using block

offlineScope.Complete(); // indicates the changes will be applied
// on disposal of offlineScope

}

All the APIs we need here are already described above. The only thing we need is SessionHandler playing nearly the same role as SyncState tracker, but using ~ Dictionary inside OfflineStateContainer instead of SyncState(of T) objects.

Gathering changes is also simple here: they're either extracted right from this dictionary (for state-level sync), or from StoredTransaction objects (since nothing is actually persistend when OfflineStateContainer is bound to a Session, they aren't actually persisted as well). Extracted change log will be applied to underlying Session, and in case of success the new state will be marked as "fetched" in state container. Otherwise nothing will happen (and, of course, you'll get an exception).

You may note such an API perfectly suits for implementing caching as well. Earlier I've explained why this is so attractive for new DO. Here I've shown it's rather easy to implement it with such an architecture.

So as you see, we propose the following approaches:
1) WPF client. It must use either OfflineStateContainer or IMDB, if it want to query the state it caches.
2) A wizard in ASP.NET application - it will use OfflineStateContainer (i.e. keep it in ASP.NET session).
3) Synchronizing client (e.g. AdWords Editor) - any regular DB (syncing) as local storage + OfflineStateContainer or syncing IMDB for its UI. As you see, you can build master-slave chains of arbitrary length ;)
4) Slave (branch) server - any regular DB as its local storage. Note that sync here will affect on its performance, so it's also desirable to use caching. Action-level sync will ensure that finally anything is done fully honestly on the master.

What's left? "P2P sync: local database". Let's leave this topic for future discussions. Here I only can say tracking here is actually much simpler. We'll use the same approach as in Microsoft Sync Frameworkm and thus initially we'll support only state-level sync here.

Finally, there was "Case 6) Public service." Here all is upon your wish. Let's list the most obvious options you have:
1) Convert the sequence of our Entities to your own POCOs by LINQ to Enumerable and send them via WCF. Btw, shortly you'll be able to do this right from LINQ - we fully support anonymous types there (but they can't be sent via WCF), so adding POCO support here must be really easy, since they are almost absolutely the same as anonymous types. The only problem you have here is how to detect & propagate the updates. Many solutions, but all require some coding.
2) Probably the best approach here is to develop something like Object-to-Object mapper. Simple 1-to-1, but capable of detecting and applying updates. I was surprised that Google provides lots of solutions for .NET - actually I thought this term is still mainly related to Java.
3) Finally, you can use ADO.NET Data Services (Astoria) to publish our Entities via RESTful API wuth zero coding at all. Again, this must work, since we support LINQ, but we didn't try this. But shortly we'll definitely try this. Really important, because this allows to interact with DO backend from Silverlight.

So here is the answer why we don't care about POCO support as much as others do: POCO is required only in case 6. And:

1. I think case 6 is faced more rarely in comparison to cases 1-5, especially in startups. Opening a public API normally imply your project is already rather famous. And in this case spending some additional money on developing a public API is fully acceptable. On the other hand, in many cases sync is what you need from scratch. It should be simply a built-in feature. No any integration Sync Framework, no any code to support action-based sync. It must just work.

2. As I've shown, there are many ways of producing POCO graphs from complex business objects and getting back the changes, so this part is simple. If you have just POCOs provided by your ORM, you simply shouldn't do this at all, and that's nice. But you loose all the infrastructure we provide - persistence awareness, sync, caching, atomicity, validation and so on. And I think custom implementation of all this stuff is incomarably more complex than implementation of conversion to POCO - even if you won't use any tool, it is very simple, pattern-based task.

3. I like KISS approach. POCO, PI are KISS attributes for me. But I hate implementing something complex by my own, expecially if I feel it must be bundled into the framework. A good example of non-KISS approach is WPF: it's base types seems rather unusual and a bit complex. Dependency prioperties... Thay are even defined in rather strange fashion there. The guys developing it have made a step aside from the standard control architecture and... Developed a masterpiece!

Their DataContext property... Do you remember "our way" of data binding in WindowsForms? The first thing I added was BindingManager. You add it to window or panel, and its DataBoundObject starts playing the same role as DataContext in WPF! You could nest them binding the nested one to the lower one! There were Fill and Update methods! Do you see the analogy? I've always dreamed about the way of binding WPF offers, but couldn't get it implemented of WindowsForms. Why? Well, because WindowsForms was done wrong. The guys inventing it were following KISS principle. They simply didn't want to think. They simply copied the common approach. And made others to invent a stuff like BindingManager.

So I think making a step aside from common path is good, if you see this path doesn't solve the problems well. Although going too far from it is quite risky ;)

Thus providing POCO support is good. But it isn't good to make it your god.

Btw, even WPF isn't ideal :) E.g. I really dislike they require an object to support INotifyPropertyChanged and INotifyCollectionChanged. Why they fight so much for POCO/PI in EF, and don't follow the same path here? Why they expect that model objects will support their crazy, purely UI-related interfaces? Why I can't give them a notification service responsible for tracking state changes & notifying WPF? Requiring to implement purely UI interfaces on model objects seems much more illogical than requiring to inherit them from Entity-like persistence aware base. So actually I have the answer, but you won't like it ;) I feel in case with EF they simply follow the mode, as well as many others. But on practice implementing a special interface is simply acceptable. Does this mean that PI is nothing more than BS?

Ok, it's enough nightly philosophy, I'm going to get some sleep now. See you tomorrow or the day after - now I'm going to update this blog almost daily.

P.S. I'm going to add pictures to this article tomorrow. Further it will go to our Wiki as well. Any comments are welcome ;)

Tuesday, June 09, 2009

Thinking aloud

"So, tell me, my little one-eyed one, on what poor, pitiful, defenceless planet has my monstrosity been unleashed?"

(C) Lilo & Stich ;)

DataObjects.Net v4.0 final is out

DataObjects.Net v4.0 final is published in downloads section of our web site.

Most important changes we've made last month include:

1. Schema upgrade API. Shortly it will be fully described @ wiki, for now you can try out Upgrade Sample. There are 3 applications representing different versions of the same one. They must be run sequentially except the first one (it recreates the database), although you can try running them in order.

Upgrade API is really simple. To gracefully handle an upgrade, you must:
- Add AssemblyInfo attribute to AssemblyInfo.cs file: [assembly: AssemblyInfo("MyAssembly", "2.0")]
- Extend UpgradeHandler in each changed model assembly - it will assist to upgrade its structures.
- If necessary, override its AddUpgradeHints method and add instances of UpgradeHint ancestors there: RenameFieldHint, CopyFieldHint, RenameTypeHint. Note that upgrade hints are model-level hints, not schema-level.
- If necessary, apply [Recycled] attribute to some of your types and fields and override it OnUpgrade method to implement custom data migration logic from recycled structures there.

All these steps are shown in upgrade sample.

As you may find, UpgradeHandler offers more extension points, and there are schema-level upgrade hints as well. Normally they shouldn't be used - we translate our model-level hints to schema-level hints automatically. But some of them can be useful - for example, IgnoreHint instructing schema comparison layer to skip the specified schema object.

2. Refactored LINQ translator. Almost no differences outside (it passes just few more new tests), but a huge set of them inside. Faster, much simpler and eaisier to extend. Did I mention we're going to support custom LINQ extensions? Well, this version allows us to implement this gracefully.

3. Refactored mapping attributes. Since we've been shipping final, this became rather important. Details can be found here; new samples reflect this. Wiki will be updated in accordance shortly.

4. Improved installer & documentation
- Regular build installer now contains all downloadable prerequisites integrated
- Brief version of class reference is provided @ help.x-tensive.com and installed as .Chm. Full version (.HxS) is integrated into VS.NET help collection.

Other important changes:
- Improved PropertyConstraintAspect - now custom error message can be specified right in the attribute declaration
- We decided to get rid of VistaDB support. Earlier it was easy because of its good compatibility with SQL Server, but now it becomes more and more complex. There are actually many differences - normally acceptable for humans, but making much more complex machine translation for it. Our LINQ implementation and schema upgrade API provide very high level of compatibility between different RDBMS, and this compatibility costs a lot. So we'll focus on more frequently used RDBMS (Oracle, MySQL); furthermore, presence of our own memory provider and upcoming file system provider makes support of embedded databases much less necessary.

Public issue tracker is open
And finally, during the last month we've been using issue tracker on Google code (btw, it is really good). So all we did is easy to expose now: check out "Done" grid for v4.0 final. Previous changes are related to "Pre4.0" milestone, so they aren't shown here. Quicks statistics:
- 25 defects
- 14 new features
- 9 enchancements
- 9 refactorings
- 9 non-coding tasks.
Total: 66 issues.

If you're interested in near future, take a look at similar grid for v4.1. Some features mentioned here (mainly, fixes & tasks) will be separated to v4.0.5, which will be available in ~ 2 weeks.

And as I mentioned, from this point we're switching to monthly release cycle. So v4.1 can be expected in the first half of July.

Friday, June 05, 2009

DO4 vs its key competitors (EF, NHibernate, LLBLGen, Genom-e) - the very first post in upcoming long cycle

I'm going to publish a set of posts here comparing various EF and NH concepts to the same parts in DO. Obviously, this isn't a short-term task, so for now I'm publishing just a starting article here. I'll try to be fully objective showing all pros and cons of different approaches.

I hope to hear some critics in comments ;)

Let's go:

Difference #1: Entity and its state: are they all the same object, or different ones?

From the point of many pure ORM frameworks, including EF and NHibernate, it's the same object. Let's call this scenario as
Case 1.

From our point there is a set of different objects (Case 2):
- Entity. It acts like an adapter providing high-level (i.e. materialized, object-oriented) representation of its own EntityState. It's a lightweight object containing mainly just its State.
- EntityState. A lightweight object binding actual state data (a DifferentialTuple exposed via its State property), Entity it belongs to and Transaction from which this state originates.
You may find it is a descendant of TransactionalStateContainer(of DifferentialTuple) - it actually implements all state invalidation logic.
- DifferentialTuple - the low-level state representation. Again, a lightweight object aggregating two others: Difference and Origin. Both are Tuples. Origin describes the original (fetched) state. Difference describes all the changes made to it. DufferentialTuple exposes field updates stored in Difference as if they'd be applied to Origin - i.e. Differential tuple is Tuple as well.

It looks like I should provide a short description of our Tuples framework:

Tuples are lightweight objects providing access to their typed fields by numbers. Conceptually they're very similar to Tuples in .NET 4.0 BCL, but there are quite important differences:
- Our tuples aren't structs - they're classes. They're dynamically typed. E.g. there can be many different types of tuple exposing the same .NET 4.0 Tuple structure, not just one. Moreover, we can write a code working with generally any type of Tuple, but not just with the specified one - the methods providing access to their internals are virtual. So from this point our Tuples are closer to a List with specified type of each item, rather than to .NET 4.0 tuples; and on contrary, .NET 4.0 Tuples are quite similar to our Pair and Triplet (we haven't implement Quintet just because we didn't need it ;) ).
- There are generally 2 kinds of tuples: RegularTuples and TransformedTuples (although we don't put any restrictions here). Tuples of the first kind are actual data containers. Tuples of the second kind are lightweight data transformers: they don't store the data themselves, but transform them from their sources on demand. We use them in RSE (e.g. to join two record sets, cut out something and so on - certainly, limiting the depth of such transformation chain) and during the materialization.
- Our Tuples always have TupleDesciptor and maintain nullability and availability flags for each field. So they're designed for RDBMS-specific calculations.
- Actual types for RegularTuples are generated on demand in runtime. We don't generate generics to be able to "compress" such fields as boolean. Any boolean value including availability and nullability flag takes exactly one bit.
- Tuples are fast - nearly as fast as List(of T).

Ok, so now we can return back to entity and its state. Let's study pros and cons of our approach closer.

Further I'll use "persistent field" term, although normally they're exposed as properties. "Persistent field" means property, which value must be persisted. I use "field" here mainly because it's closer to the nature of such properties.


Cons (by importance):

C1. Slower materialization. As you see, we must instantiate 4 lightweight objects (Entity, EntityState and DifferentialTuple) in comparison to one Entity in e.g. EF. But in fact there isn't so dramatic difference: in addition to Entity we should usually materialize its Key. This implies at least one dictionary lookup (~ equal to 5 memory allocations) and creation of 1-2 objects. So in general, this may decrease materialization speed ~ twice.

On the other hand, Case 2 allows to re-use the sessions without post-transaction cleanups (see P3), and in this case many of these "additional" lightweight objects (Entity, EntityState and DifferentialTuple) will be simply re-used by subsequent transactions.

C2. Slower property reads - reasons are the same here. Although this is much less important, since intensive data readers must be spending more time on fetching the data from readers and materialization.

C3. Slower property writes - by the same reasons. This is even less important, since persist speed is limited by ~ 10-30K entities / sec. on the current hardware. Moreover, this is quite arguable: any ORM with change tracking service does nearly the same job in this case.

C4. Additional levels of abstractions - obviously, this isn't good, if we'll prove they aren't really necessary.


Pros:

P1. Most frequently used requirements are bundled into the framework. In particluar:

P1.1. Change tracking - DifferentialTuple perfectly handles this. The original state is always available, and there is no any need to capture it separately, as it's required in Case 1 (note: if this is implemented in Case 1, this affects either on materialization or on update speed). The same is about differences - we know exactly what fields are updated. Moreover, you shouldn't care about implementing any change tracking logic: when you call a protected SetField-like method, changes are tracked automatically.

P1.2. Lazy loading of reference fields: if we have a field of Entity type in Case 1, we have two options on materializing the entity with such a field: either materialize the referenced Entity as well, or set it to null / do nothing. First case isn't good because we most likely should fetch this entity (this is quite slow, since reqires DB roundtrip). The second one imply we will either show null instead of actual value, which obviously isn't good. Or we should use something like Lazy behind this field, and capture its key into some other field(s) on materialization. Ok, but such Lazy costs 8 bytes (4 bytes for boolean flag and 4 - for the reference) in addition to the key field(s). Moreover, further we must ensure they're kept in sync. Finally, we can maintain just key field(s) and use something like this.DataContext.Resolve(customerID). But what we just did? Exactly, we converted low-level data representation to the high-level one.

P1.3. Lazy loading of simple fields: availability flags make it easy to load non-reference fields on demand. But in Case 1 this is really ugly and difficult problem.

P1.4. Distinguishing between "assign field value" and "materialize field value" operations: in many cases it's important to know if set property operation is invoked by materializer. E.g. if there is some value validation logic, it must be skipped on materialization. In Case 1 you should explicitly check this (btw, is this possible in EF? Is so, how?); in Case 2 you don't have this problem at all, because we can "materialize" the state itself. Entity is materialized by invocation of a special protected constructor taking its state as the only argument of it (such constructors are automatically provided by one of our aspects).

So in general, all P1.X "pros" simplify developer's life. Their affection on performance exists as well, but from my point of view it is much less important.

P2. Absence of strong references between Entities. This allow us to use such Session caching policies as week reference based caching. If there are relatively long transactions processing lots of data, but their working set is relatively small, they can easily survive in Case 2, and will die with OutOfMemoryException in Case 1. E.g. reading ~ 1M of 2-field objects is enough to make EF to die with this exception on a PC with 2GB RAM!

Btw, this doesn't mean we can't use Dictionary-based caching policy. Just mention you need InfiniteCache in SessionConfiguration. Currently we use a chain of LruCache + WeakCache by default (but this is still a subject to possible changes).

P3. Zero cost of invalidating the state on cross-transaction boundary. In our case such state invalidation happens on the fly, when EntityState notices this.Transaction!=this.Session.Transaction. Case 1 implies real field-by-field cleanup with ~ linear cost. This is important, if you run lots of BLL on application server and reuse Sessions and Entities.

P4. Perfect for global caching. Imagine we have a global (i.e. Domain-level) cache storing the Origins of mentioned DifferentialTuples, i.e. fetched Entity states. We never modify the Origin - did I mention this? And the fact that Tuples supports concurrent reads? Ok, this means:
- We need almost nothing to materialize the Entity in this case: we'll create EntityState and Entity without any field-by-field copying.
- If there are cached EntityState & Entity in the Session (the chances of this are also high enough), we'll just set the Origin!
- We can easily add "originates from global cache" mark to our EntityState, which presence will imply later optimistic version check on update. Case 1 requires this mark to be stored externally.
- Entities from different Sessions will share the same state objects. So probably this approach will allow us to make the amount of RAM needed in Case 2 lower then in Case 1.

Why I wrote about caching? Because this is really important (even distributed cache hit is much cheaper than database roundtrip - not saying about local in-memory cache), and thus planned. This time we're going to provide a global cache API allowing you to decide what to cache and how to cache:
- We'll allow to cache anything (including query results), as well as fetch anything directly from global cache without any database-level version checks (but this will lead to optimistic version check on update of such an entity).
- Global cache API will be ready to Velocity.


Ok, that's enough for today. Obviously, from my point of view Case 2 is better. I think this is a distinguished element of Rich Domain Model - as you might know, that's what we like. On the other hand, Case 1 with its simple Entity design pushes you toward Anemic Domain Model (anti-pattern by Martin Fowler) - everything is clear here, but you should do even very simple things by your own. Try to implement a bit more complex application over such a DAL (50-100 persistent types), and you're almost in our camp. You need all of P1 everywhere hating to code them the same way - Ctrl-C, Ctrl-V, Ctrl-C, Ctrl-V, Ctrl-C, Ctrl-V, Ctrl-C, Ctrl-V...

Code duplication is probably what I hate the most. Especially if something I use pushes me to do this many many times.

Few nice links to end with:
- Anemic vs Rich Domain Models: do you know people from Java camp prefer Rich Models? In any case, this is a short post you can start the investigation from.
- Entity Framework as an OR/M: a good article from Genom-e's authors. A bit old taking into account upcoming EF4.0 (note, how fast they reached 4.0 ;) ), but anyway, very nice.

P.S. Shortly we'll publish some results of our LINQ implementation comparison. You'll laught: full LINQ support is myth. Even EF fails on really simple cases - not saying about the others. Obviously, except DO4 ;)

Wednesday, June 03, 2009

New Class Reference

What do you think about this structure of Class Reference? Is it better than the old one - with flat list of 100+ namespaces, or we should revert this back?

Monday, June 01, 2009

Our Wiki got links to class reference

Check out this feature here.

Note: our online help is working on beta version of new Help Server, so there are some issues, that will be resolved on this week.

Upcoming articles

We're going to publish a set of articles here. Please write which ones are more important for you.
- Model-first concept overview.
- Rich vs Anemic models.
- DO4 and upcoming .NET 4.0: friends or foes? Readiness of DO4 for new features.
- ORM feature comparison: LINQ implementation.

Publish your opinion in comments.

Welcome!

As you might assume, this is our new shiny DO4 related blog.

There were some other changes related to disclosure of important information during the last week:
- Issue tracker page now contains the most useful views. In fact, it looks like much more detailed version of roadmap now.
- Issues are organized by the upcoming milestones. Their priority reflects the chance of getting it implemented in the specified milestone. "Overnight" priority is reserved for fixes that must be done ASAP.

So now it's really easy to track the current progress: just open this view.

Note: changes in Wiki are reflected on recent changes page.

And finally, this blog has appeared. We're going to update it much more frequently than our company's blog, which further will be used mainly for announcements. Here we'll publish:
- Feature descriptions (mainly, as links to new articles in Wiki)
- Core examples and tips
- Feature-related polls and discussions
- Our reviews of various features of DO4 and their comparison to features of other ORM tools for .NET
- Discussion of various blog posts related to ORM for .NET
- Links to articles we've found interesting or, on contrary, arguable.
- And so on.

So in short, this blog will aggregate our thoughts about ORM, and in particular, DO4. You may expect 3-4 posts per week here.