Databases Reference
In-Depth Information
You need to be very concerned about table storage expansion due to partially
filled blocks. This is not a concern with index selection.
You can improve your selection by looking at the query execution plan for each
query (all relational database products allow you to view the query execution plan). 2
The query execution plan can show you how much I/O is projected to happen at each
stage of the plan execution. You can focus your dimension selection on the columns that
are used in operators incuring high I/O cost. This will focus your design efforts where
they matter most allowing you to gain a better sense of which clustering dimensions are
more likely than others to help reduce disk I/O.
Additionally, because MDC is very helpful for roll-out, if your system has roll-out
requirements, consider creating an MDC dimension on the roll-out predicate. This is
almost always a DATE field. In most cases it is advisable to coarsify the date to WEEK
or MONTH before using it as an MDC dimension.
Finally, consider using MDC even when there is only a single interesting dimension
for clustering within a given table. MDC has the advantage over most other single-
dimensional clustering indexes that:
The clustering is guaranteed to be enforced.
The block indexes will be orders of magnitude smaller than RID-based indexes.
Once you have a set of candidate dimensions you must test the number of cells to
see if the table expansion for each table on which you wish to apply MDC is going to be
reasonable (~10% or less, ideally less than 5%). If not, then you have a choice of reduc-
ing the number of dimensions or coarsifying one or more dimensions in order to reduce
the number of cells.
TIPS AND INSIGHTS FOR DATABASE PROFESSIONALS
Tip 1. Start your selection for MDC candidates using the same initial set of col-
umns you would use for index selection, by looking for columns that are used in
your SQL workload as predicates for equality, inequality, range, and sorting.
Tip 2. Focus on keeping the number of cells reasonably low. Keeping the table
expansion low is critical so that storage requirements don't expand unreasonably
beyond roughly 5-10% for any single table. You can use the MDC storage approxima-
tion equation to estimate the expansion.
2 In DB2 this is done through the EXPLAIN facility.
Search WWH ::




Custom Search