During the transformation of a predicate, only comparison operations having an access to key fields of an index can be used to restrict the set of index's entries which need to be loaded. Therefore, the usage of different indexes for the transformation produces different RangeSets.
At the next step, the engine calculates the cost of loading the data for each index. To do this, we compile the corrensponding RangeSet expression and evaluate it into the actual RangeSet. If the source predicate contains an instance of Parameter class, the engine uses the expected value of this parameter during evaluation of compiled RangeSet expression. So finally we get a RangeSet object identifying index ranges that must be extracted from a particular index to evaluate the query using this index.
The cost calculation is based on index statistics, which exists for each of our indexes. Approximately, statistics is a function returning approximate amount of data laying in particular index range. In fact, it's a histogram of data distribution, where the amount of data is bound to Y axes, and the index key value is bound to X axes.
After the completion of costs' calculation for all indexes, the engine selects the index and corresponding RangeSet associated with the minimal cost. Currently, we use pretty simple selection algorithm which selects the cheapest index for each part of the source predicate independently. In future we plan to implement the more complex and effective algoritm here, but for now it's ok.
When the index is selected, we perform actual query transformation:
- If primary index is selected, the engine does not modify the source query.
- If one of secondary indexes is selected, the engine inserts IndexProviders corresponding to selected indexes into the query, adds RangeProviders after them (extracting RangeSets associated with them) and finally joins primary index. The original filtering criteria (FilterProvider) follows all this chain.
- We don't eliminate unused columns on this stage - this is done by additional column-based optimization step running further.
No comments:
Post a Comment