Database Reference
In-Depth Information
Table 13-1. Conditions That Lead to Partition Pruning*
Condition
Range
List
Hash
Equality ( = )
IN
BETWEEN , > , >= , < , or <=
IS NULL
*Inequality (!= or <>), NOT IN, IS NOT NULL conditions, and restrictions
based on expressions and functions don't lead to partition pruning.
Choosing the partition key and the partitioning method is probably the most important decision you have to
make while working on the design of a partitioned table. The objective is to take advantage of partition pruning to
efficiently process as many SQL statements as possible. In addition, one partition should ideally contain only the data
you expect to be processed by a single SQL statement. For example, if there are frequent SQL statements processing
data by the day, you should partition by the day. Or, if there are frequent SQL statements that process data by country,
you should partition the data by country. If you do this incorrectly, you'll never be able to take advantage of partition
pruning. The following four characteristics of your application have to be considered very carefully, because they're
the ones that impact the partitioning strategy the most:
1.
What columns the restrictions are expected to be applied on and with what frequency
2.
What kind of data will be stored in those columns
3.
What the SQL conditions used in those restrictions will be
4.
Whether data must be regularly compressed or purged and what criterion to base the
processing on
The first and the fourth are essential for choosing the partition key. The second and the third are for choosing the
partitioning method. Let's discuss them in detail.
Knowing which columns the restrictions are applied on is essential because no partition pruning is possible if
no restriction is applied to the partition key. In other words, based on this criterion, you restrict the choice between
a limited number of columns. In practice, it's quite common for several, possibly very different, restrictions to be
applied by different SQL statements. Therefore, you must know the frequency of utilization of the different SQL
statements as well. In this way, you can decide which restrictions are the most important to optimize. In any case, you
should consider only those restrictions that have weak selectivity. In fact, restrictions with strong selectivity can be
optimized with other access structures (for example, indexes).
Once the columns to be potentially used in the partition key are known, it's time to take a look at the data
they store. The idea is to find out which partitioning method would be applicable to them. For that, it's essential to
recognize two things. First, only range and list partitioning allow the grouping together of “related” data (for example,
all of July's sales or all European countries) or an exact mapping of specific values with specific partitions. Second,
each partitioning method is suitable only for a very specific kind of data:
Range is suitable for values that are sequential by nature. Typical examples are timestamps
and numbers generated by a sequence.
List is suitable when the number of distinct values is both well known and limited in number.
Typical examples are all kinds of status information (for example, enabled, disabled,
processed) and attributes that describe people (for example, sex, marital status) or things
(for example, country and postal code, currency, category, format).
 
 
Search WWH ::




Custom Search