Databases Reference
In-Depth Information
10
Counting and Data
Sampling in Physical
Design Exploration
Get your facts first, and then you can distort them as
much as you please.
Facts are stubborn, but statistics are more pliable.
—Mark Twain (1835-1910)
There are three kinds of lies: lies, damned lies, and statistics.
—Benjamin Disraeli (1804-1881)
C
ounting and sampling are critical to effective database design. There is perhaps
nothing more natural than wanting to explore something to see what it is really
like before plunging in and committing. Most of us use counting and sampling strate-
gies every day for various things we do. We nibble on food that we are cooking to see
how it tastes or we read a few lines from a topic before we buy it. Sampling is one of the
oldest and most common strategies for getting a good sense of a large system quickly. It
has marvelous applications to physical database design, particularly using SQL capabili-
ties for counting and sampling that are being added by database vendors.
Some of the most important design problems that can be helped through sampling
include materialized view size estimation, index size and key duplication (or cardinality)
projections, multidimensional clustering storage, and shared-nothing partitioning skew.
In some ways data sampling is so important it can be hard to develop a truly top-notch
database design without it. In all of these cases understanding the data is the goal, and
sampling is a tool that can help speed the analysis.
177
Search WWH ::




Custom Search