News, examples, tips, ideas and plans.
Thoughts around ORM, .NET and SQL databases.

Wednesday, June 10, 2009

Disconnected (offline) entities, POCO and sync in DataObjects.Net 4

First of all, when we need disconnected objects? Generally, we need them in cases when some data must be accessed and modified without opening a transaction. It's something like getting a cached version of web page when you turn your browser into offline mode - you can read, but only what's cached. Moreover, in our case we want to be able to modify such disconnected objects, and persist the changes made to them back to the storage when we're getting back online (connected).

Now let's look on some particular use cases:
1) WPF client. To show the data in UI fast, it must store its cached version in RAM. Normally it must flush the changes back to the database only when Apply button is clicked.
2) A wizard in ASP.NET application that must make a set of changes through a set of postbacks, but they must be really flushed into the storage only on the final wizard page.
3) Synchronizing client. It maintains its own version of database and periodically syncs with the server (not with the database, but with middle-tier server). Example: AdWords Editor. But probably the most well-known example is generally any IMAP e-mail client.
4) Slave (branch) server. It maintains its own version of database and periodically syncs it with the master server. Such architecture is used by applications working in corporations with many distant branches. Branch servers provide their own clients with the data and periodically sync with the master to decrease the load on it.
5) Peer-to-peer sync. Skype is probably the most well-known example of such an application. It syncs chats between its different instances, including different instances using the same Skype account.
6) Public service. We're going to publish our entities for public access to allow other developers to create their own programs using our service. In above cases it was implied we develop all parts of the distributed application choosing any technology we want. So we could use DO at any interacting side. Here we can't - public API must be based on public standards. So we need to support public standards here, e.g. provide RESTful API, publish POCO objects via WCF service, etc.

I hope that's all. Any additions? Feel free to add them in comments.

Can these scenarios be implemented in DO? Yes and no.
Good news: new DO will support all of above scenarios. Most of them will be supported after release of v4.1, and some (case 5: P2P sync) - after 4.2.
Bad news: consequently, v4.0 supports none of above, except case 6 - actually, just because here all depends on you. In you're interested in details, the complete description of this case is in the end of this article.

So further I'll describe what we're going to provide to implement above cases, and what is already done. First of all, let's classify above cases by 5 properties:
1) Size of the disconnected storage. Will it fit in RAM?
2) Queryability of disconnected storage. Will we query it, or just traverse it (get related object, get an object by key)?
3) Concurrency on disconnected storage. Will we access it concurrently?
4) Sync type. Do we need master-slave or P2P sync?
4.1) Sync level for master-slave sync, If we've chosen master-slave sync, do we need action-level sync or state-level sync?

The classification for above 6 cases is:
1) Fits in RAM, not queryable (although in some cases it could be), non-concurrent, master-slave sync, any sync level is possible.
2) Fits in RAM, not queryable (although in some cases it could be), non-concurrent, master-slave sync, any sync level is possible.
3) Doesn't fit in RAM, queryable, likely - concurrent (different instances of application will share the same database), master-slave sync, any sync level is possible.
4) Doesn't fit in RAM, queryable, concurrent, master-slave sync, any sync level is possible.
5) Doesn't fit in RAM, queryable, likely - concurrent, P2P sync.
6) Any case is possible. Let's forget about this group for now.

As you see, cases 1-5 form 3 groups with the same properties:
1) Fits in RAM, not queryable (although in some cases it could be), non-concurrent, master-slave sync, any sync level is possible.
2) Doesn't fit in RAM, queryable, likely concurrent, master-slave sync, any sync level is possible.
3) Doesn't fit in RAM, queryable, likely concurrent, P2P sync.

Let's group them once more, taking into account the following facts:
- Doesn't fits in RAM = needs local database (to store the state)
- Queryable = needs local database (to run local queries)
- Concurrent = needs local database (to handle isolation and transactions)
- Otherwise = needs local state container fitting in RAM and allowing to resolve entities by their keys fast.

So in the end we have just 2 groups:
1) Master-slave sync: local state container or database, any sync level.
2) P2P sync: local database.

Requirements
1. We must fully support initial 1-5 cases.

1. We want to work with the same persistent types at any side (ASP.NET application, WPF client, branch server, master server, etc.) as in case without any sync, although sync may affect on behavior of persistent types (e.g. fetches and lazy loads can be propagated to master).

1.1. It must be easy to detect current sync behavior for persistent types ("sync awareness").

2. We must be able control automatic fetches from master on the slave, as well as and query propagations. We want to:
- Define regions (with using (...)) with desirable fetch and query modes.
- Specify that query must be executed on master or locally right in query. E.g. with use of .ExecutionSide(...) extension method.

3. Update propagation must be explicit.

4. Sync must be a pluggable component built over core Storage facilities. It must utilize just open API. Certainly, this implies we must make it enough open for this.

Decisions

Let's start from "Master-slave sync, local database, any sync level" option. To develop it, we need the following components:

1. Embedded database provider(s).
1.1. In-memory database provider is helpful when disconnected storage fits in RAM, but must be queryable. Actually this is true in many cases - e.g. back reference search on removal needs the storage to be either queryable, or enough tiny to run such queries using LINQ to Enumerable.

As you see, we already have this part. And it will be completely perfect when our memory provider will be transactional. It's really pretty easy with our indexing architecture, but this is postponed at least till 4.2.

2. SyncState tracker (SessionHandler ancestor). In fact, it will be chained to regular SessionHadler to "intercept" fetches, updates and queries. It is responsible for fetching any info from the remote storage in case when local database doesn't contain it, as well as caching and tracking such fetched info when it arrives from remote database. Internally it will use SyncState(of T) entities - do you know we already support automatic registration of generic instances? It really useful in cases you need an associate to generally any type (so sync & full-text search are perfect candidates on usage of this feature).

As you might assume, SyncState allows to track:
- Presence of SyncState indicates the object is fetched from or checked at the master
- IsNull - existence flag. Eliminates duplicate fetches for removed objects.
- Its original version & state (Tuple)
- IsModified flag.

So it allows to derive what properties are changed with ease. That's how we'll do state-level sync. Generally, we must produce a sequence of (original version, state change (Tuple again) and IsRemoved flag) - one for each object with IsModified==true, and send it to master to apply all the changes (we'll use PersistentAccessor for this - probably we'll even add a method allowing to make the whole state change at once - by passing a new Tuple).

3. Action-level tracker. Implementation of Atomicity OperationLogBase writing the actions to the stream, that will be further "attached" to StoredTransaction object (again, persistent). Since we've already implemented serialization & Atomicity itself, this must be rather easy.

When such a sequence is fetched back from the StoredTransaction, it's possible to roll it back, send it to a remote part and apply it there or re-apply it locally. Note that such an operation log contains the information needed to validate the possibility of applying it, or rolling it back. So that's how

4. Clinet-side sync API. ~ ApplyChanges \ CancelChanges methods in Session and SessionHander, as well as their implementations for two above sync cases.

5. Server-side counterpart for client-side sync handlers - a WCF service handling all requests related to sync. Most likely there will be a single one, taking SessionConfiguration as the only its option. Our new Domain is ready for Session pooling, so this solution should work perfectly even under high concurrency.

Now let's look on "Master-slave sync, local state container, any sync level" option. The primary difference here is that we don't have local database - i.e. it's still the same database, but we must be able to:
- Temporarily protect it from any updates. We must cache them in state container.
- State container must also provide repeatable reads: anything we've fetches must be stored in it, and further be available without database access attempts.

Let's look how this could work:

var stateContainer = new OfflineStateContainer();
using (var offlineScope = session.TakeOffline(stateContainer)) {
// here we can run many transactions, updates won't be propagated
// to session
until the end of this using block

offlineScope.Complete(); // indicates the changes will be applied
// on disposal of offlineScope

}

All the APIs we need here are already described above. The only thing we need is SessionHandler playing nearly the same role as SyncState tracker, but using ~ Dictionary inside OfflineStateContainer instead of SyncState(of T) objects.

Gathering changes is also simple here: they're either extracted right from this dictionary (for state-level sync), or from StoredTransaction objects (since nothing is actually persistend when OfflineStateContainer is bound to a Session, they aren't actually persisted as well). Extracted change log will be applied to underlying Session, and in case of success the new state will be marked as "fetched" in state container. Otherwise nothing will happen (and, of course, you'll get an exception).

You may note such an API perfectly suits for implementing caching as well. Earlier I've explained why this is so attractive for new DO. Here I've shown it's rather easy to implement it with such an architecture.

So as you see, we propose the following approaches:
1) WPF client. It must use either OfflineStateContainer or IMDB, if it want to query the state it caches.
2) A wizard in ASP.NET application - it will use OfflineStateContainer (i.e. keep it in ASP.NET session).
3) Synchronizing client (e.g. AdWords Editor) - any regular DB (syncing) as local storage + OfflineStateContainer or syncing IMDB for its UI. As you see, you can build master-slave chains of arbitrary length ;)
4) Slave (branch) server - any regular DB as its local storage. Note that sync here will affect on its performance, so it's also desirable to use caching. Action-level sync will ensure that finally anything is done fully honestly on the master.

What's left? "P2P sync: local database". Let's leave this topic for future discussions. Here I only can say tracking here is actually much simpler. We'll use the same approach as in Microsoft Sync Frameworkm and thus initially we'll support only state-level sync here.

Finally, there was "Case 6) Public service." Here all is upon your wish. Let's list the most obvious options you have:
1) Convert the sequence of our Entities to your own POCOs by LINQ to Enumerable and send them via WCF. Btw, shortly you'll be able to do this right from LINQ - we fully support anonymous types there (but they can't be sent via WCF), so adding POCO support here must be really easy, since they are almost absolutely the same as anonymous types. The only problem you have here is how to detect & propagate the updates. Many solutions, but all require some coding.
2) Probably the best approach here is to develop something like Object-to-Object mapper. Simple 1-to-1, but capable of detecting and applying updates. I was surprised that Google provides lots of solutions for .NET - actually I thought this term is still mainly related to Java.
3) Finally, you can use ADO.NET Data Services (Astoria) to publish our Entities via RESTful API wuth zero coding at all. Again, this must work, since we support LINQ, but we didn't try this. But shortly we'll definitely try this. Really important, because this allows to interact with DO backend from Silverlight.

So here is the answer why we don't care about POCO support as much as others do: POCO is required only in case 6. And:

1. I think case 6 is faced more rarely in comparison to cases 1-5, especially in startups. Opening a public API normally imply your project is already rather famous. And in this case spending some additional money on developing a public API is fully acceptable. On the other hand, in many cases sync is what you need from scratch. It should be simply a built-in feature. No any integration Sync Framework, no any code to support action-based sync. It must just work.

2. As I've shown, there are many ways of producing POCO graphs from complex business objects and getting back the changes, so this part is simple. If you have just POCOs provided by your ORM, you simply shouldn't do this at all, and that's nice. But you loose all the infrastructure we provide - persistence awareness, sync, caching, atomicity, validation and so on. And I think custom implementation of all this stuff is incomarably more complex than implementation of conversion to POCO - even if you won't use any tool, it is very simple, pattern-based task.

3. I like KISS approach. POCO, PI are KISS attributes for me. But I hate implementing something complex by my own, expecially if I feel it must be bundled into the framework. A good example of non-KISS approach is WPF: it's base types seems rather unusual and a bit complex. Dependency prioperties... Thay are even defined in rather strange fashion there. The guys developing it have made a step aside from the standard control architecture and... Developed a masterpiece!

Their DataContext property... Do you remember "our way" of data binding in WindowsForms? The first thing I added was BindingManager. You add it to window or panel, and its DataBoundObject starts playing the same role as DataContext in WPF! You could nest them binding the nested one to the lower one! There were Fill and Update methods! Do you see the analogy? I've always dreamed about the way of binding WPF offers, but couldn't get it implemented of WindowsForms. Why? Well, because WindowsForms was done wrong. The guys inventing it were following KISS principle. They simply didn't want to think. They simply copied the common approach. And made others to invent a stuff like BindingManager.

So I think making a step aside from common path is good, if you see this path doesn't solve the problems well. Although going too far from it is quite risky ;)

Thus providing POCO support is good. But it isn't good to make it your god.

Btw, even WPF isn't ideal :) E.g. I really dislike they require an object to support INotifyPropertyChanged and INotifyCollectionChanged. Why they fight so much for POCO/PI in EF, and don't follow the same path here? Why they expect that model objects will support their crazy, purely UI-related interfaces? Why I can't give them a notification service responsible for tracking state changes & notifying WPF? Requiring to implement purely UI interfaces on model objects seems much more illogical than requiring to inherit them from Entity-like persistence aware base. So actually I have the answer, but you won't like it ;) I feel in case with EF they simply follow the mode, as well as many others. But on practice implementing a special interface is simply acceptable. Does this mean that PI is nothing more than BS?

Ok, it's enough nightly philosophy, I'm going to get some sleep now. See you tomorrow or the day after - now I'm going to update this blog almost daily.

P.S. I'm going to add pictures to this article tomorrow. Further it will go to our Wiki as well. Any comments are welcome ;)

5 comments:

  1. The issues related to this post:

    Sync:
    112, 111

    Caching:
    133, 132

    Impl. details:
    186, 185, 184, 183, 182, 181, 180, 134, 131, 130, 129

    ReplyDelete
  2. Fantastic post. Thank you, Alex.

    So we'll have to wait til 4.2 to convert our 3.9 WinForms (Offlines) projects to new DO?

    ReplyDelete
  3. Err... Why 4.2? I think 4.1.

    If all you need fits into hight-critical priority on v4.1 feature map, and there is nothing special you need is related to v4.2, you should wait for v4.1.

    Or you can wait for 2-3 weeks, checks the progress of our team and start. We're going to show v4.1 in ~ mid July.

    Btw, quite likely you'll need a migration helper from v3.9. See issue 165. Currently it isn't related to any release, but... We'll anyway need this. May be someone else will vote here for it here or there?

    ReplyDelete
  4. Ah, forgot - most likely we'll postpone P2P sync to 4.2 (issue tracker doesn't reflects this yet, but here I already wrote about this). So you need P2P sync?

    ReplyDelete
  5. Btw... Have you noticed we could use an approach with OfflineStateContainer in v3.X and avoid making separate hierarchy for offlines there? I.e. there was a huge architectural mistake.

    ReplyDelete