Databases Reference
In-Depth Information
One such heuristic is to consider the largest table of the star join query as the fact table
(which, in addition, must have a specified minimum size, currently defined as 100 pages).
The second heuristic requires that all the joins in a star join query need to be inner joins,
and use equality predicates on a single column. It should also be noticed that even in the
rare case where a dimension table is incorrectly chosen as a fact table through the use of
these heuristics, the Query Optimizer will still select a valid plan which will return the
correct data, although it may not be an efficient one.
Regarding optimizations for star join queries, it is interesting to consider the use of
Cartesian (or Cross) products of the dimension tables with multi-column index lookups
on a fact table. Although Cross products are avoided during the regular optimization
process because they can generate huge intermediate results, they can be used for data
warehouse queries involving small dimension tables. As the rows of the Cross product are
being generated, they are immediately used to look up on a multi-column index without
requiring a lot of memory for the intermediate results.
In Optimizing Star Join Queries for Data Warehousing in Microsoft SQL Server , 1 Cesar
Galindo-Legaria et al. define three different approaches to optimizing star join queries
based on the selectivity of the fact table, as shown next. As mentioned in Chapter 3,
Statistics and Cost Estimation , selectivity is a measure of the number of records that
are estimated to be returned by a query and, slightly anti-intuitively, smaller numbers
represent higher selectivity (i.e. fewer rows).
For highly selective queries which return up to 10% of the rows in the fact table, the
Query Optimizer may produce a plan with Nested Loops Joins, Index Seeks and
bookmark lookups. For medium selectivity queries, which return anywhere from 10 to
75% of the records in the fact table, SQL Server may recommend Hash Joins with bitmap
filters in combination with fact table scans or fact table range scans. Finally, for the least
selective queries, processing more than 75% of the fact table, the Query Optimizer mostly
will recommend regular Hash Joins with fact table scans. The choice of these operators
1 Published in the Proceedings of the 2008 IEEE 24th International Conference on Data Engineering.
 
Search WWH ::




Custom Search