Counting and Data Sampling in Physical Design Exploration - Physical Database Design

Databases Reference

In-Depth Information

10

Counting and Data

Sampling in Physical

Design Exploration

Get your facts first, and then you can distort them as

much as you please.

Facts are stubborn, but statistics are more pliable.

—Mark Twain (1835-1910)

There are three kinds of lies: lies, damned lies, and statistics.

—Benjamin Disraeli (1804-1881)

C

ounting and sampling are critical to effective database design. There is perhaps

nothing more natural than wanting to explore something to see what it is really

like before plunging in and committing. Most of us use counting and sampling strate-

gies every day for various things we do. We nibble on food that we are cooking to see

how it tastes or we read a few lines from a topic before we buy it. Sampling is one of the

oldest and most common strategies for getting a good sense of a large system quickly. It

has marvelous applications to physical database design, particularly using SQL capabili-

ties for counting and sampling that are being added by database vendors.

Some of the most important design problems that can be helped through sampling

include materialized view size estimation, index size and key duplication (or cardinality)

projections, multidimensional clustering storage, and shared-nothing partitioning skew.

In some ways data sampling is so important it can be hard to develop a truly top-notch

database design without it. In all of these cases understanding the data is the goal, and

sampling is a tool that can help speed the analysis.

177

Search WWH ::

Custom Search

Home