News, examples, tips, ideas and plans.
Thoughts around ORM, .NET and SQL databases.

Friday, November 18, 2011

Precise control over indexes

This is a detailed description of the index-related features from the upcoming release mentioned in the previous post.

Database indexes were invented to improve the speed of data retrieval operations on a database table. But as always, they have the cost: slower writes and increased storage space. So database architects have to keep the balance between faster data retrieval and slower writes. Ideally, indexes should be applied only on columns that are used in filtering, ordering, etc.

1. Control over indexes on reference fields

Versions of DataObjects.Net prior to the upcoming one were pretty straightforward and applied indexes on every reference property, for example, in this model index would be placed on Owner property automatically.

public class Penguin : Entity {

  [Field]
  public Person Owner { get; set; }
}

DataObjects.Net behaved so in assumption that such indexes would boost performance in queries like:
var PoppersPenguins = session.Query.All<Penguin>().Where(a => a.Owner == MrPopper);

But what if the application doesn't have such queries? And the index that is created and maintained by the database is absolutely useless, moreover, it makes the performance worse? The upcoming version of DataObjects.Net has the answers. Here is what you can do to prevent automatic index creation on a reference field:

public class Penguin : Entity {

  [Field(Indexed = false)]
  public Person Owner { get; set; }
}

2. Control over index clustering

There are 2 main approaches in physical organization of data tables: unordered and ordered. The first one means that the records are stored in the order they are inserted, the second implies that the records are pre-sorted before storing. Both of them have advantages and disadvantages, depending on structure of primary key, usage scenarios, etc. Some database servers supports both cases, others - only one. Microsoft SQL Server supports both: unordered approach is a "heap table (file)" and ordered one is based on "clustered index" when data rows are physically sorted according to position of primary key in internal index structures (B+ tree).

For Microsoft SQL Server DataObjects.Net always chose "clustered index" as an organization of physical storage because this is quite efficient in most cases with only few exceptions: when your primary key is not integer auto-incremented data type but uniqueidentifier or char/varchar, clustered indexes become less efficient on data inserts. Until now there was no way to say that there is no necessity to make primary index clustered, but that has changed: DataObjects.Net provides easy way to control clustering of indexes. Here is how:

// Penguin table should not be clustered
[HierarchyRoot(Clustered = false)]
public class Penguin : Entity {

  [Field]
  public Person Owner { get; set; }
}

Note, removing clustering from hierarchy root you remove it for all its descendants.

Additional feature is the ability to define an index that is not primary and then use it for clustering, like this:

// Person table should be clustered by Name column
[HierarchyRoot]
[Index("Name", Clustered = true)]
public class Person : Entity {

  [Field]
  public string Name { get; set; }
}

This can be done for each class separately except the case when SingleTable inheritance scheme is used. Another obvious restriction is that there can be only one clustered index per table.

3. Support for partial (filtered) indexes

A partial index, also known as filtered index is an index which has some condition applied to it so that it includes a subset of rows in the table. This allows the index to remain small, even though the table may be rather large, and have extreme selectivity.

Our goal was not only to add the support, but to make it easy to use and prevent users from the necessity to write database server-dependent expressions in index definitions. Therefore, we decided to implement the following technique that gives us compile-time validation, type-safety and the power of standard .NET expressions:

[HierarchyRoot
[Index("Owner", Filter = "OwnerIndex")]
public class Penguin : Entity {

    public static Expression<Func<Penguin, bool>> OwnerIndex()
    {
      return p => p.Owner != null;
    }

    [Field]
    public Person Owner { get; set; } 
}

Note that depending on inheritance scheme applied, partial indexes can or can't contain expressions with columns defined in ancestors/descendants.

In case you want to move filter expression to a separate class, you should use the following construct:

[HierarchyRoot]
[Index("Owner", Filter = "PenguinOwnerIndex", FilterType = typeof(FilterExpressions))]
public class Penguin : Entity {

    [Field]
    public Person Owner { get; set; } 
}

public static class FilterExpressions {

    public static Expression<Func<Penguin, bool>> PenguinOwnerIndex()
    {
      return p => p.Owner != null;
    }
}

Partial indexes are supported in Microsoft SQL Server and PostgreSQL providers.

In the next post I'll describe other features of the upcoming release, stay tuned. In the meantime, we are preparing the new builds, installers and stuff.

3 comments:

  1. This is very beautiful.

    IMO, The partial indexes are pure genius type of work. ;)

    ReplyDelete
  2. Thanks, Malisa.

    I'd say, they are a state of art. =)

    ReplyDelete
  3. I really like your approach of blogging. I bookmarked it to my bookmark website record and shall be checking back soon. Pls check out my site as properly and let me know what you think. Eminem is an American songwriter, rapper, record producer and an actor. Hе hаѕ also bееn dеѕсrіbеd аѕ оnе оf thе grеаtеѕt hір-hор аrtіѕtѕ оf аll tіmе, оnе оf thе grеаtеѕt rарреrѕ оf аll tіmе аnd was lіѕtеd іn Rоllіng Ѕtоnе lіѕt оf 100 grеаtеѕt аrtіѕtѕ оf аll tіmе аt number 83. Here are a few more interesting facts and Eminem net worth you might not have known.

    ReplyDelete