News, examples, tips, ideas and plans.
Thoughts around ORM, .NET and SQL databases.

Wednesday, July 08, 2009

Index-based query optimization - Part 1

Probably you know, that DO4 includes our own implementation of RDBMS. Currently, we support in-memory DB only, but the development of file-based DB is scheduled. As well as other RDBMS we try to optimize a query execution to achieve the better performance. There are several ways to perform such optimization. In this post, I describe the optimization based on indexes.

The aim of this optimization is to reduce the amount of data to be retrieved from an index. This reduction can be achieved by loading only those index entries with keys belonging to specified ranges. The query execution engine tries to create these ranges by transforming predicates of FilterProviders found in the query. Currently, the engine can process only those filters which are placed immediately after IndexProvider, but this part of the algorithm will be improved.

The engine tries to transform predicates to Conjunctive Normal Form (CNF) before the extraction of index's keys ranges. If this transformation was successful then the engine analyzes terms of this CNF. There are two kinds of CNF's terms:
  • Comparison operation;
  • Stand-alone boolean expression.
Some of comparison operations can be transformed to ranges of index's keys. It is possible when only the one part of comparison operation contains an expression with Tuple.

The list of comparison operation which are recognized by the query execution engine:
  • >
  • <
  • ==
  • !=
  • >=
  • <=
  • Compare methods
  • CompareTo methods
  • StartsWith methods
person.Age > 10

If a term is stand-alone boolean expression (e.g. a is SomeType), then the engine creates range which will represents all index's keys or none.

We also support multi-column indexes. For example, expression person.FirstName == "Alex" && person.Age > 20 can be transformed to range of keys of an index built for FirstName and Age columns.

If a predicate can not be normalized, then the engine extracts index's keys ranges from it by walking recursively through its expression tree.

The following expressions are always transformed to the range representing all index's keys:
  • Comparison operation containing access to a Tuple in both of its parts;
  • Expressions which are not recognized as comparison operation.
In the next posts, I will describe further details of our index-based query optimization algorithm - e.g. how the engine selects the best index to use based on statistics.

No comments:

Post a Comment