Database Reference
In-Depth Information
such scenarios, the good properties of various indexes need to be quantitatively
balanced, as it becomes challenging to understand how indexes interact to
answer a given query. In general, some form of quantitative and systematic
analysis becomes crucial for an effective physical design, as illustrated next
by using examples.
3.1.2 Systematic Physical Database Design
Consider a query that returns tuples satisfying a range predicate from a single
table R :
Q1 = SELECT A, B, C
FROM R
WHERE 10<A<20
Suppose that we want to obtain the best configuration of indexes for such
a query. In this case, we can show that an index I 1 =
is optimal.
In fact, we can seek I 1 to obtain all tuples that satisfy the range predicate
on column A . Additionally, index I 1 includes columns B and C , which together
cover the query and avoid additional RID lookups. With the exception of the
initial traversal of index I 1 to obtain the first tuple that satisfies the predicate,
the plan that uses I 1 accesses exactly the tuples that constitute the answer
for the query and therefore cannot be improved.
Now consider a slight variation of Q1 , with an additional range predicate
on column B :
R
(
A
|
B
,
C
)
Q2 = SELECT A, B, C
FROM R
WHERE 10<A<20
AND20<B<100
A plan using the previous index I 1 would seek the index and obtain all tuples
that satisfy predicate 10<A<20 . From this intermediate result, all tuples
that do not satisfy predicate 20<B<100 would be eliminated. Note that
in this case we do not scan just the query result. If most of the tuples satisfy
the predicate on A but just a handful do so for the predicate on B , we would
read many unnecessary tuples from disk, thus wasting several inputs/outputs
(I/Os). Analogously, a strategy using index I 2 =
would first obtain
all tuples satisfying predicate 20<B<100 and then filter those not satisfy-
ing the predicate on A . It can be shown that for any data distribution, either
I 1 or I 2 results in the better plan, but it is not immediate which one is better
for any specific instance. If we have access to statistics on base tables (e.g.,
histograms as discussed in Chapter 2) and a rough understanding of the cost
model of the optimizer, we can calculate, for any data instance, which alter-
native between I 1 and I 2 is better. Specifically, we should choose the index
whose key column corresponds to the more selective predicate.
R
(
B
|
A
,
C
)
Search WWH ::




Custom Search