News, examples, tips, ideas and plans.
Thoughts around ORM, .NET and SQL databases.

Friday, July 31, 2009

Validation & property constraints in DO4

There is an interesting topic related to validation, property constraints and compatibility with third-party validation frameworks in our support forum. I'm explaining some background concepts and internals of our validation framework (Xtensive.Integrity.Validation) there.

I'll be glad to answer any other questions related to validation, if they'll appear ;)

Wednesday, July 29, 2009

Preliminary ORM performance comparison: DataObjects.Net 4 vs NHibernate

We've just adopted our CrudTest for NHibernate. First results:

DO4 (LINQ):
Insert: 28,617 K/s.
Update: 34,111 K/s.
Fetch & GetField: 8,682 K/s.
Query: 1,486 K/s.
CachedQuery: 8,176 K/s.
Materialize: 358,671 K/s.
Remove: 41,108 K/s.

NHibernate (LINQ):
Insert: 12,936 K/s.
Update: 12,939 K/s.
Fetch & GetField: 7,152 K/s.
Query: 95,7/s.
CachedQuery: Cached queries are not supported in NH yet.
Materialize: 37,892 K/s.
Remove: 13,012 K/s

The highlighted numbers show ~ 10 times difference, although DO wins in all other cases as well. Results of this test for several other ORMs are upcoming, the project will be shared @ Google Code.

In addition, we're developing general LINQ test as well. For now LINQ for NHibernate passes ~ 25 tests out of 100. This mean only very basic LINQ stuff really works on this release. DO4 passes ~ 98 tests there (it still misses passing arrays/collections as query parameters).

P.S. The optimization we've made during last 2 weeks is already quite successful (although we still work on materialization performance). Pre-optimization results can be found here.

Tuesday, July 28, 2009

What models do we maintain

Since there are at least 3 visible models, it's necessary to explain why we maintain so many of them ;)

Our models stack is combined from the following models:

Xtensive.Storage.Model (+ Xtensive.Storage.Building.Definitions)

That's the top-level model used by storage. In fact, it consists of two parts:
- Definitions model: XxxDef types, e.g. TypeDef
- Runtime model: everything else. E.g. TypeInfo. Exposed via Domain.Model property.
- Serializable version of runtime model: see Xtensive.Storage.Model.Stored namespace.

Definitions are used on the first step of Domain build process. We reflect all the registered typed and build "model definition". The definitions can be freely added, removed or modified by your own IModules - you should just implement OnDefinitionsBuilt method there. This allow modules to dynamically build or change anything they want - e.g. they can add a property to any of registered types, or register a companion (associated) typefor each one of them. Note that nothing special must be done to make this happen, you should just ensure that necessary module is added to the set of registered types.

So definitions describe crude model - there is just minimal information needed to build a runtime model.

And as you might assume, runtime model is built by definitions gathered gathered on the previous step. It is much more complex - e.g. it fully describes any association and mapping.

In general, it is build to immediately answer on any question appearing during the Domain runtime.

Finally, there is an XML-serializable version of model. It is loaded & serialized to database during each schema upgrade. Our schema upgrade layer uses it to propertly translate type-level hints related to old model to its schema-level hints and uses it in upgrade process to make it more intelligent. You can find it serialized into one of rows in Metadata.Extension table.

Xtensive.Storage.Indexing.Model

Let's call it schema model. This is a low-level model of storage we use during schema comparison and upgrade process. It differs from Storage.Model, because:
- Storage model maintains two-way relationships betweem type-level and storage-level objects. E.g. betweem types and tables, properties and columns. But here we need only a part of this model related to storage-level objects.
- Storage model is built to quickly answer on common questions. This model is designed to schema change and comparison well.
- Storage model is more crude. E.g. DO isn't much interested of foreign keys - it should just know there must be a foreign key. Schema model knows all the details about it.

Schema model is used to:
- Compare extracted and required schema. This process is actually more complex than you might expect - to generate the upgarde actions well, we split the comparison process into a set of steps, and comparing the models related to them. In fact, we do something like: ExtractedModel -> Step1Model -> Step2Model -> ... -> RequiredModel. That's why we must be able to clone and change it nearly as any SQL Server does this. Step1Model here may refer to model with dropped foreign key constraints, Step2Model can be e.g. model containing temporarily renamed schema objects and so on. Yes, we can safely handle rename loops like A->B', B->C', C->A' - we detect & break such loops by renaming one of objects in it to temporary named one on intermediate step ;)
- Index engines use this schema as their native schema format. Actually it is quite fast as well, if locked ;)

Schema models are available via two Domain properties:
- ExtractedSchema
- Schema.

Xtensive.Sql.Model

This is the SQL schema model - a schema model of SQL database in its native form. It differs from the above one - it describes all the SQL schema terms instead of a part of them we need. For example, you can find View and Partition objects there, althogh for now we don't have their analogues in schema model.

This model us used to:
- Produce extracted schema model. SQL DOM provides Extractor allowing to extract it for any supported database; the result of its work is sent to SqlModelConverter (a part of any SQL storage provider) to produce the schema model from it. So this converter is responsible for such decisions as ignoring non-supported SQL schema objects and so on.
- Produce SQL statements (commands). SQL DOM refers to its objects by its statement model objects, such as SqlAlterTable.

You can't access this model in runtime, but it is available to any SQL storage provider via its DomainHandler.Mappings member (see Xtensive.Storage.Providers.Sql namespace - it looks like I expluded this part from the brief version of API reference). These mappings are used to produce generally any SQL command sent by provider.

Xtensive.Sql

This is our model of SQL language - so-called SQL DOM. You can think objects from this model (except SQL schema model objects) normally have rather short lifetime, since they represent parts of particular SQL commands. But actually this isn't true:
- We cache almost any SQL request model we build. Cached request models are bound to particular CRUD operations, LINQ and RSE queries. So in general we almost never build a request model twice.
- We cache even pre-translated versions of request parts. Relatively long strings from which we combine the final version of requests. This allows us to produce a version of request with differently named parameters almost instantly.
- Moreover, we support branching in SQL DOM request model. It is used to produce different versions of request containing external boolean parameters. Earlier I wrote this can be quite important: let's imagine we compiled a request with all the branches. One of such branches there may require table scan in query plan, and thus query plan the whole SQL request will rely on table scan. This "slow" branch could be a rarely used one (i.e. condition turning its logic "on" is quite rarely evaluated to true). But the plan will always use table scan, since RDBMS produces the most generic query plan version. An example of such query is "Select * from A where @All==1 or @Id==A.Id". Check out its plan on SQL Server. Then imagine, if normally @All is 0. Branching & pre-translated query parts allow us to handle such cases perfectly.

Monday, July 27, 2009

What we're busy with?

Since there is a vacations season, last weeks we've been working mainly on performance (I promised we'll spend some time on this some day). And I'm glad to announce we've reached almost invincible level:
- We beat plain SqlClient on insertion test by about 15%. Seems almost impossible, yes? Well, but this is the effect of our batching implementation. Later I'll uncover all the details.
- Update test is also quite close to SqlClient mark.
- Materialization is one more area where we've got really good progress. No any exact numbers here, since we're working on it now.

The new, ultra high-speed DO bolide will be shown by the end of this week.

Wednesday, July 22, 2009

Query transformation pipeline inside out: APPLY rewriter

DO4 is designed to support various RDBMS having different capabilities. One of such capabilities is possibility to use reference values from the left side of join operation in its right side. Microsoft SQL Server supports this kind of join via APPLY statement. But, for example, PostgreSQL does not provide any similar feature. However, an operator like APPLY is required to translate of many LINQ queries - such backreferences are very natural to LINQ.

As you might know, our LINQ translation layer translates LINQ queries to RSE - in fact, query plans. Further these plans are sent to our transformation & optimization pipeline, which is different for different RDBMS, although it is combined from the common set of optimizers (transformers). So if some RDBMS does not support certain feature, we add a transform rewriting the query to make it compatible with this RDBMS. In the end of all we translate the final query plan (RSE query) to native query for the current RDBMS. Note that if we meet something that can't be translated to native query on this step, an exception is thrown.

Let's return back to the subject of this artcile. There is ApplyProvider in RSE, which does the same job as APPLY statement. When query is compiled from LINQ to RSE, we freely use ApplyProvider everywhere where it is necessary. But as I've mentioned, there are some RDBMS that does not support it, and we must take care about this. That's the story behind APPLY rewriter.

To archive the same behavior for different RDBMS, we try to rewrite queries containing ApplyProvider (i.e. requiring APPLY statement to be translated "as is") to get rid of references to the left side from the right side. We perform the following modifications in the source query during rewriting process:
Obviously, such rewriting is not always possible. Below there is the list of cases when we can rewrite the source query:
  • Right part returns a single row: in such a case rewriting is not required, since we can translate such ApplyProvider to SQL as "SELECT [left columns], [right columns] FROM ...", which is supported by any RDBMS
  • Columns used in expressions of considered FilterProviders or CalculateProviders are not removed by other providers (e.g. by AggregateProvider).
  • Reference to ApplyParameter is contained only in one of sources of a BinaryProvider, if such a provider exists in the right part of the original ApplyProvider.
  • Right part of ApplyProvider contains references only to its own ApplyParameter.
That's it. This is enough to get rid of APPLY from the most part of queries initially looking as if they'd involve it. Obviously, we can't eliminate APPLY from any queries.

An example LINQ query leading to CROSS APPLY on LINQ to SQL:

from category in Categories
from productGroup in (
  from product in Products
  where product.Category==category
  group product by product.UnitPrice
)
select new {category, productGroup}

Its SQL:

SELECT [t0].[CategoryID], [t0].[CategoryName], [t0].[Description], [t0].[Picture], [t2].[UnitPrice] AS [Key]
FROM [Categories] AS [t0]
CROSS APPLY (
  SELECT [t1].[UnitPrice]
  FROM [Products] AS [t1]
  WHERE [t1].[CategoryID] = [t0].[CategoryID]
  GROUP BY [t1].[UnitPrice]
  ) AS [t2]

We can't get rid of CROSS APPLY in such a query as well. On the other hand, the good thing is that we can translate it properly, if APPLY is supported - in contrast to almost any other ORM we've looked up (except EF and LINQ to SQL). The same is correct for APPLY-related query transformations - as we know, only EF and LINQ to SQL are aware about them.

Monday, July 20, 2009

DO4: continuous integration and testing

I planned to write this for a long time ago. We use TeamCity to continuously build & test DataObjects.Net 4 assemblies. Currently there are:

Post-commit tests. Run after every commit for all dependent projects.

Pre-commit project for Xtensive.Storage. There are 6 tests configurations:
- Memory,
- PostgreSQL 8.2, 8.3, 8.4
- SQL Server 2005, 2008.

The tests there are running when pre-tested commit TeamCity feature is used.

Nightly tests. There are 6 projects (one per each RDBMS version we support: Memory, PostgreSQL 8.2, 8.3, 8.4, SQL Server 2005, 2008), and each of them is tested in 6 different configurations to check everything works in all mapping strategies we support. To achieve this, we use two special IModule implementations in our tests:
- InheritanceSchemaModifier: sets InheritanceSchema to the specified one for all the hierarchies of Domain it is used in. Usage of this module multiplies possible test configurations by 3 (for ClassTable, SingleTable and ConcreteTable).
- TypeIdModified: if specified, injects TypeId column in to any primary key. This is important, because injection of TypeId may significantly affect on fetch performance for hierarchies with deep inheritance; moreover, since TypeId is handled specially in many cases, this allows to check all this logic properly works if TypeId is injected into key. Usage of this module multiplies possible test configurations by 2 (with and without TypeId in keys).

As you see, this gives 6 test configurations per each project, so totally we have 36 nightly test configurations.

All these tests are running on 3 primary test agents, althought there are few additional ones - e.g. we have a special agent running on virtual machine dedicated to build DataObjects.Net v3.9, since it requires outdated version of Sandcastle Help File Builder and some other tools.

Few screenshots:



Thursday, July 16, 2009

Huge June discounts are back in July!

Hi everyone! We decided to return huge June discounts back now. They'll be intact till the end of July, but if it will be desirable, we'll consider doing the same in August.

So it's still the perfect time to join DO4 camp ;)

Wednesday, July 15, 2009

Index-based query optimization - Part 2

In this post, I describe how the query execution engine selects the best index to use. When the engine finds an IndexProvider (for a primary index) in the source query, it searches for all the secondary indexes associated with the given primary index. Then, the engine performs the transformation of the filter predicate to the RangeSet (actually - to an expression returning RangeSet), for each found index. Also, the RangeSet for primary index is created. As I wrote earlier, we're converting the original predicate to Conjunctive Normal Form (CNF) to do this.

During the transformation of a predicate, only comparison operations having an access to key fields of an index can be used to restrict the set of index's entries which need to be loaded. Therefore, the usage of different indexes for the transformation produces different RangeSets.

At the next step, the engine calculates the cost of loading the data for each index. To do this, we compile the corrensponding RangeSet expression and evaluate it into the actual RangeSet. If the source predicate contains an instance of Parameter class, the engine uses the expected value of this parameter during evaluation of compiled RangeSet expression. So finally we get a RangeSet object identifying index ranges that must be extracted from a particular index to evaluate the query using this index.

The cost calculation is based on index statistics, which exists for each of our indexes. Approximately, statistics is a function returning approximate amount of data laying in particular index range. In fact, it's a histogram of data distribution, where the amount of data is bound to Y axes, and the index key value is bound to X axes.

After the completion of costs' calculation for all indexes, the engine selects the index and corresponding RangeSet associated with the minimal cost. Currently, we use pretty simple selection algorithm which selects the cheapest index for each part of the source predicate independently. In future we plan to implement the more complex and effective algoritm here, but for now it's ok.

When the index is selected, we perform actual query transformation:
  • If primary index is selected, the engine does not modify the source query.
  • If one of secondary indexes is selected, the engine inserts IndexProviders corresponding to selected indexes into the query, adds RangeProviders after them (extracting RangeSets associated with them) and finally joins primary index. The original filtering criteria (FilterProvider) follows all this chain.
  • We don't eliminate unused columns on this stage - this is done by additional column-based optimization step running further.
P.S. The article was actually written by Alexander Nickolaev. I just re-posted if after fixing some mistakes.

New benchmark results

Check out this article.

Wednesday, July 08, 2009

Index-based query optimization - Part 1

Probably you know, that DO4 includes our own implementation of RDBMS. Currently, we support in-memory DB only, but the development of file-based DB is scheduled. As well as other RDBMS we try to optimize a query execution to achieve the better performance. There are several ways to perform such optimization. In this post, I describe the optimization based on indexes.

The aim of this optimization is to reduce the amount of data to be retrieved from an index. This reduction can be achieved by loading only those index entries with keys belonging to specified ranges. The query execution engine tries to create these ranges by transforming predicates of FilterProviders found in the query. Currently, the engine can process only those filters which are placed immediately after IndexProvider, but this part of the algorithm will be improved.

The engine tries to transform predicates to Conjunctive Normal Form (CNF) before the extraction of index's keys ranges. If this transformation was successful then the engine analyzes terms of this CNF. There are two kinds of CNF's terms:
  • Comparison operation;
  • Stand-alone boolean expression.
Some of comparison operations can be transformed to ranges of index's keys. It is possible when only the one part of comparison operation contains an expression with Tuple.

The list of comparison operation which are recognized by the query execution engine:
  • >
  • <
  • ==
  • !=
  • >=
  • <=
  • Compare methods
  • CompareTo methods
  • StartsWith methods
Examples:
person.Age > 10
person.Name.StartsWith("A")


If a term is stand-alone boolean expression (e.g. a is SomeType), then the engine creates range which will represents all index's keys or none.

We also support multi-column indexes. For example, expression person.FirstName == "Alex" && person.Age > 20 can be transformed to range of keys of an index built for FirstName and Age columns.

If a predicate can not be normalized, then the engine extracts index's keys ranges from it by walking recursively through its expression tree.

The following expressions are always transformed to the range representing all index's keys:
  • Comparison operation containing access to a Tuple in both of its parts;
  • Expressions which are not recognized as comparison operation.
In the next posts, I will describe further details of our index-based query optimization algorithm - e.g. how the engine selects the best index to use based on statistics.

Monday, July 06, 2009

Property constraints

Let us suppose that a Person class has Age property of type int and its value can not be negative. We certainly can implement this check in two different ways: Check value in property setter or in OnValidate method.

In first case we have to expand auto-property, and write a code like this:

[Field]
public int Age
{
  get { return GetVieldValue<int>("Age"); }
  set {
    if (value < 0)
      throw new Exception(string.Format(
        "Incorrect age ({0}), age can't be less than {1}.",
        value, 0);
    SetVieldValue<int>("Age", value);
  }
}


Second way is:

[Field]
public int Age { get; set; }

public override OnValidate()
{
  if (Age < 0)
    throw new Exception(string.Format(
      "Incorrect age ({0}), age can't be less than {1}.",
      Age, 0);
}


Validation behavior in those two ways is not the same, exceptions will be thrown in different stages: setting property value or validating the object. There is no single point of view, which way is preferable.

Property constraints are property-level aspects integrated with the validation system, that allow to simplify value checks implementation of those kinds. Property constraint aspect is automatically applied to properties marked by appropriate attributes. In our example constraint declaration will look like this:

[Field]
[RangeConstraint(Min = 0,
  Message = "Incorrect age ({value}), age can not be less than {Min}.",
  Mode = ValidationMode.Immediate]
public int Age { get; set; }


or

[Field]
[RangeConstraint(Min = 0,
  Message = "Incorrect age ({value}), age can not be less than {Min}.")]
public int Age { get; set; }


Each constraint attribute contains two general properties: Message and Mode. Message property value is used as the exception message when check is failed. We also plan to add the ability to get messages from strings resources, this feature will be useful for applications localization.

Mode property value determines whether immediate (check in setter) or delayed (check on object validating) should be used. All property constraints on a particular instance can also be checked with CheckConstraints() extension method;

Constraints are designed not only to work with our entities, but with any classes, that implement IValidationAware interface.

By now following property constraints are available:

[EmailConstraint]Ensures that email address is in correct format
[FutureConstraint]Ensures that date value is in the future
[LengthConstraint]Ensures string or collection length fits in specified range
[NotEmptyConstraint]Ensures that string value is not empty
[NotNullConstraint]Ensures property value is not null
[NotNullOrEmptyConstraint]Ensures property value is not null or empty
[PastConstraint]Ensures that date value is in the past
[RangeConstraint]Ensures that numeric value fits in the specified range
[RegexConstraint]Ensures property value matches specified regular expression


Other constraints can be easily implemented as PropertyConstraintAspect descendants.

Following example illustrates the variety of property constraints on a Person class:

[NotNullOrEmptyConstraint]
[LengthConstraint(Max = 20,
  Mode = ValidationMode.Immediate)]
public string Name { get; set;}

[RangeConstraint(Min = 0,
  Message = "Incorrect age ({value}), age can not be less than {Min}.")]
public int Age { get; set;}

[PastConstraint]
public DateTime RegistrationDate { get; set;}

[RegexConstraint(Pattern = @"^(\(\d+\))?[-\d ]+$",
  Message = "Incorrect phone format '{value}'")]
public string Phone { get; set;}

[EmailConstraint]
public string Email { get; set;}

[RangeConstraint(Min = 1, Max = 2.13)]
public double Height { get; set; }

Friday, July 03, 2009

People from Microsoft are linking to our Tips blog

Here is the link. So I feel it was a good idea to create our Tips blog ;)

Who's Brad Wilson? "I'm a software developer at Microsoft, working on the ASP.NET team. I've previously worked on the CodePlex and patterns & practices teams." - he says. Here is some interview with him.

Thoughts: being first vs being best

I just read a nice post from DevExpress CTO about this.

We've been among first DO1.X, 2.X and 3.X - there were tons of unique features at that moment.

Some examples:
- We've been using runtime proxies initially - starting from v1.0 in 2003. NHibernate started to use them only in 2005. Now we're using PostSharp-based aspects, and I suspect, many others already started to look onto this ;)
- We've been first who integrated full-text search & indexing into ORM. Initial version supporting Microsoft Search service appeared in 2003; Lucene.Net support have been added in 2005. Now this feature is supported by many other ORM frameworks - e.g. NHibernate and Lightspeed.
- The same is about translatable properties - we've been supporting them starting from 2003. Now they're implemented e.g. in Genom-e.
- We've implemented versioning extension in 2005. Genom-e team added similar historization feature just recently.
- And finally, schema upgrade - it was a part of DO starting from v1.0. Now it exists in many frameworks, but is always implemented as design-time feature. Btw, I understand this brings some benefits (btw, we're going to provide similar feature shortly), but what about runtime? As application users, we used to install new versions of applications without caring about running upgrade scripts. Moreover, as developer, I'd prefer getting all necessary upgrade logic executed automatically. So why all these frameworks are capable only of generating such scripts at design time? Who'll combine them? Who will execute them? Who will produce their versions for other RDBMS? Who will check if existing database version isn't too old? I think, this approach also leads quite many TODOs for very common tasks to leave them for developers.

And, AFAIK, the following features are still unique:
- Paired (inverse) properties - at least as this is done in DO. I.e. in fully commutative way.
- Persistent interfaces - again, I'm speaking about our own version of this feature.
- Access control \ security system
- Action-level instead of state level change logging (used for disconnected state change replication).

So why did DO3.X die? Because of architectural lacks, that initially allowed us to develop it fast. E.g. our mapping support was quite limited. Really, being first usually != being best.

Do you know, that:
- I was 23 years old when first version of DO has been released. I'm 10 years younger than e.g. Frans Bouma ;)
- I had a good experience with RDBMS at that moment, but as you may find, my experience was mainly limited by the scope of SMB web applications, and finally this lead to some architectural lacks in v1.X.
- I had really good programming background - my CV was very good at that point. But I didn't develop a framework comparable by the scale to DO before. But framework development guidelines are quite different in comparison to application development guidelines. Initial architectural lacks are much more painful here: in some cases you simply can't overwhelm them by adding N-th module.

On the other hand, we've got huge ORM and database experience with v1.X-3.X. We've became real experts. We've studied & fixed lots of cases and issues - starting from very frequent ones to those that seem quite hard to face in a particular application (but this doesn't mean you shouldn't keep them in mind). We know much more about what is important, and how this must work.

And what's more important, we've been ultimately the first ones exploring many new paths and features - take a look at above lists ;) Most of features there were later adopted by others - this proves they were good enough, and what's more important, this shows we can generate and implement such ideas earlier than others do. May be that's because I don't like to barely repeat the others. Take NHibernate as an example: isn't it really boring to follow the path passed by others few years ago (I mean hibernate)? Ok, it is already successful, and such a path offers attractive way to start. But is it a good enough reason to plainly repeat it, instead of making something better?

Ok, it was really a kind of rush for us, until we reached our own limits - being the first. But as you know, we didn't stopped on this! We just started a new rush - I hope this shows well what kind characters are staying behind our team ;) Moreover, all these years we've been growing up using all the opportunities we had, and this allowed us to accomplish our almost unimaginable complex task at all. Yes, now we're the only ones supporting not just third-party databases, but having our own, shiny-new, real RDBMS integrated with ORM. And we're almost ready to show the full power of this competitive advantage (wait for sync, and further - Mono Silverlight support). We're going to eliminate necessity to study and use the whole spectra of technologies you need in most of cases, including Entity Framework, Sync Framework, ADO.NET Data Services, .NET RIA Services, SQL Server Compact \ SQLite - and this isn't a full list ;)

Btw, probably, we're the only ORM vendors that won't suffer much from EF appearance: first of all, because our paths are probably the most distant ones from the point of approach to the problem. And secondly, because we already suffered enough from 2-year pause in releases ;) I feel it will be the hard time for many many others - especially the ones offering similar features. LLBLGen Pro, Genom-e, Subsonic, Lightspeed, etc., and even NHibernate (still no LINQ!) - guys, are you well prepared for this fight? At least we are - you know, Russians are used to win the wars during long and cold winters ;)

So what about being first vs being best?

It's simple. We've been carefully taking our time for 2 years. If DO1.X-3.X were mainly the first, we're making DO4.0 mainly the best - although from many points of views it is the first one as well. I'm quite happy that the most complex part of our development path is already passed now. The wheel is rotating now, its speed is growing up. We delivered v4.0 + 2 updates just in June. July promises to be even more attractive from the point of features we're going to deliver.

So I'm repeating myself once more: join DO4 camp ;)

Thursday, July 02, 2009

ADO.NET Data Services (Astoria) sample for DO4

Please refer to this post.

What's new in v4.0.2

1. Improved installer

I hope we've fixed the last "big bugs" there. The most annoying ones are:
- 228: "Add\remove programs" issue on installing both DO4 and Xtensive.MSBuildTasks
- 232: DO 4.0.1 installer doesn't update assemblies located in PostSharp directory
- 233: Projects created by project template are bound to specific installation path of DO4

Because of 228 & 232, the recommended upgrade path to v4.0.2 is:
- Uninstall DO4. If the item absents in "Add\Remove programs", just remove its folder C:\Program Files\X-tensive.com (or the installation path you've chosen).
- Uninstall Xtensive.MSBuildTasks, if you have installed it. If the item absents in "Add\Remove programs", just remove its folders from C:\Program Files\MSBuild and C:\Program Files\X-tensive.com (or the installation path you've chosen).
- Uninstall all other components previously required by DO4, including Unity, Parallel Extensions, MSBuild Community Tasks and PostSharp.
- Install new DO4. It will suggest to install just PoshSharp. Everything else is optional now; all Unity and Parallel Extensions assemblies are installed into GAC automatically.

Other changes include the following ones:
- Installer automatically detects & requires to uninstall old version of DO4.
- All required assemblies are now installed into GAC. If you're worried about this, there are .bat files allowing to get rid of them with ease.
- There are new project templates (Console, Model, UnitTests, WebApplication, WPF). But they're only for C# for now.
- New Build.bat files build new DO automatically performing all "before first build" steps. So it's really easy now to make its custom build.

Useful links:
- Full list of installer-related issues
- New installation instruction
- New "Building DataObjects.Net" instruction

2. LINQ

As you might remember, two weeks ago we didn't support 2 LINQ features:
- Group joins
- First\Single(OrDefault) in subqueries (btw, as far as I remember, Single in subqueries isn't supported in EF at all)

Both features are supported now. So now we're fully ready to compare our LINQ implementation with others - a set of articles about this will appear here soon.

Useful links:
- LINQ-related issues.

3. Breaking changes in attributes

We've refactored our mapping attributes once more. Now there are:
- Separate [Association] attribute for associations
- Separate [Mapping] attribute allowing to specify mapping names.
- No more [Entity] attribute - it was necessary just to specify mapping name, but now this is handled by a separate attribute.

Earlier their functions were distributed over [Field] and old abstract MappingAttribute.

We think new version is better: specific (and, actually, more rarely necessary) features require specific attributes.

4. Schema upgrade

We've added ChangeFieldTypeHint. So schema upgrade hint set is ideal now ;)

5. Documentation

We're slowly updating it. As you may find, we restructured our wiki. Manual is organized in step-by-step studying fashion now. Among other new articles, there is new Schema upgrade article - check it out.

6. ADO.NET Data Services (Astoria) sample

We've implemented ADO.NET Data Services (Astoria) sample on DO4. We decided to publish it separately:
- It isn't really polished yet
- It depends on Silverlight Tools, so we must decide if this additional dependency is acceptable.

It shows an Astoria service sharing entities via RESTful API, as well as WindowsForms and Silverlight clients consuming this service, showing and allowing to change the entities it gets.

What does this mean? You can share DO4 Entities using ADO.NET Data Services, query the service from the client using LINQ, update the entities on the client and send back the changes. Since Astoria client operates on Silverlight as well, you can implement Silverlight client utilizing DO4 on the server.

Btw... We disappointed in Astoria client features. You should do lots of tasks manually there, including registering of new entities, changed associations and so on. From the point of usability it's much worse than what is offered by DO4. So in general, upcoming sync will be much more attractive option for DO4 users. But on the other hand, Astoria allows to implement really simple RESTful integration API with almost zero coding.

The sample will be available @ our downloads section today.

7. Bugfixes

We've got really good results here. Earlier I wrote there are just few failing tests from about 1000 tests for Storage. Imagine:
- About 600 tests are related to our LINQ implementation, and indirectly - RSE implementation.
- All the tests produce the same results - even on Memory storage. This means our RSE execution & optimization engine works as expected.

So the version we have now seems really stable. Good luck trying it ;)

Wednesday, July 01, 2009

DataObjects.Net v4.0.2 is out

What's new? Check out. You can download it right now.

All the details will follow up shortly.

DataObjects.Net v4.0.2 is on the way to you ;)

I'm working on its publication right now. What's done? Check it out.

We've implemented 41 issues during last 2 weeks. And finally got really perfect test results:
- Auto Memory: Tests failed: 3, passed: 954, ignored: 43
- Auto PostgreSql: Tests failed: 3, passed: 956, ignored: 40
- Auto SqlServer: Tests failed: 3, passed: 955, ignored: 41

AFAIK, 1 of 3 failing tests actually fails because of specific restrictions on build agents. Others are related to rounding issues on different servers, and it seems there is no ideal way to resolve them. So likely, we'll just describe this. So we can almost fully honestly say all our tests are passing now.

There is difference in ignored tests as well - that's because some of them are provider-specific. E.g. schema upgrade tests don't run on Memory storage.

Complete test sequence includes about 30 different configurations (with various configuration domain options, etc.): ~ 10 for each provider type. They're running now, and there should be a bit more mistakes. Above results are for Auto configurations - it combines the most common options.

P.S. I'll briefly highlight the most important changes in the next post.