Databases Reference
In-Depth Information
benefits of data clustering. Clustering reduces the I/O required to resolve a query while
table expansion increases the amount of pages and therefore the I/O associated with a
table. Of course, aside from the performance concerns, the extra storage requirements
require additional disk, which means increased capital expenditure. Therefore, a key
design goal when designing an MDC table is to select the clustering dimensions such
that the ratio of the number of unique cells to the number of required storage blocks is
kept relatively low. No exact formula currently exists to model an “ideal” storage expan-
sion, though most of the empirical data suggests that keeping the storage expansion to a
minimum is an important design goal. In the expert MDC designs that have been pub-
lished, in most cases the MDC dimensions were selected so as to deliberately limit table
expansion to 3-10%.
Estimates of the space waste can be made by assuming each cell contains a single
partially filled block at the end of its block list. The space waste is then:
8.1
η cells is the number of unique cells in the resulting MDC table, P % is the average
percentage of each storage block left empty per cell, and is the blocking size. On average,
the last block in each cell will be 50% filled, except in cases of largely empty cells (very few
tuples in the cell). In the presence of either data skew, or very high cell cardinality, the
number of cells with very few rows may increase, resulting in a high vacancy rate in the
final block of some cells. In fact, from a practical point of view when designing MDC
tables, a design should always be chosen that is very conservative on storage expansion,
constraining the expansion within the range of 5-10%. With that goal in mind the choice
of P % is not critical, provided it is larger than 50%, since the goal is to observe gross
expansion of space rather than to estimate space use accurately. We recommend using
65% as a fairly conservative value for P % .
To complete the design examination, it is necessary to know the number of cells in
the candidate table design under consideration, namely
where
η cells . There are two ways that
η cells can be determined:
1.
Estimating the value using available table statistics.
2.
Counting the number of cells using an SQL query.
For example, consider the candidate MDC design in Figure 8.3, using COLOR,
YEAR, and REGION. Using table statistics available in the database catalogs, we can
find the number of unique values in each of these three columns. The number of cells in
the MDC design will be, at most, the product of these numbers.
 
Search WWH ::




Custom Search