Databases Reference
In-Depth Information
Figure 10.5
Random sampling versus stratified sampling.
full complement of data is often not available and the database may be populated (if at
all) with synthetic data that does not have the same skew and sparseness qualities as the
real user data that will eventually populate the production database. Despite this serious
limitation, these methods are broadly applicable where data is available.
TIPS AND INSIGHTS FOR DATABASE PROFESSIONALS
Tip 1. If you have data, count it. Counting will help you significantly improve
your database design, for index selection, materialized views, multidimensional
clustering, and hash partitioning. It is one of the most basic things to do, and in
many ways decisions that are made about physical database design without count-
ing are unreasonably risky.
Tip 2. Remember the rules of thumb when counting for database design:
a. Indexes: The number of distinct elements in the index is at least 30% of the
table cardinality.
b. Materialized views: A good database design should never spend more than 10
to 20% of total storage on materialized views.
Search WWH ::




Custom Search