Database Reference
In-Depth Information
associated with equality predicates, scanning such index still satisfies the
order in O ). The included columns in this case are obtained as in the
previous scenarios. We finally compare the cost of the two alternatives
(i.e., the one derived from scenario 3 with an additional sort operator
and the one that natively returns tuples in the right order) and return
the one with the minimal expected cost.
Consider an index request over table T with E
={ (
a
, 100 ) } , R
={ (
b
, 10 ) } ,
P
=∅ , O
={
c
} , and A
={
d
} . The best index ignoring the sort column is
) . This index does not return tuples in the right order, so the
resulting plan would have a sort operator on top. The alternative index that
returns tuples in the right order is I 2 =
I 1 =
T
(
a
,
b
|
c
,
d
) . The best index is the one
that results in the cheaper execution plan for the request.
T
(
a
,
c
|
b
,
d
4.1.4 Reducing the Candidate Set
The approaches discussed earlier obtain a candidate set of indexes that likely
contains the optimal configuration. Although the resulting candidate set is
usually small, there might be opportunities to further reduce its size. Specifi-
cally, indexes in the candidate set can be helpful for the query but nevertheless
conflicting among each other. For instance, an index on table S that is part
of an index-based join between an outer table R and S conflicts with another
index that seeks a single-table predicate in S and performs a hash join with
R (because both indexes cannot be used in the same execution plan).
A common mechanism to reduce the set of indexes is to rely on the query
optimizer itself. Suppose that we create all candidate indexes and optimize
the query. The optimizer can now leverage the good properties of each such
index and obtain the overall optimal plan. In such a case, we can inspect this
plan and identify which of the candidate indexes are used and can refine our
candidate set so that it contains only these. A well-behaved optimizer, faced
with this subset of indexes, would result in the same optimal plan.
Some clarifications are needed to further understand this idea. First, it
seems that we need to create a large set of indexes just to optimize a query
and obtain a reduced set of candidate indexes. Creating indexes is very expen-
sive and thus can be seen as a crucial drawback of this approach. However, as
we explain in Chapter 5, we can simulate the optimization of the input query
without actually creating the indexes. Second, there is a problem if the initial
candidate set contains multiple clustered indexes for the same table, because
the resulting configuration is invalid and cannot be materialized. In general,
to guarantee optimality, we need to perform the optimization step for each
combination of clustered indexes. Some approaches avoid this time-consuming
step by heuristically choosing a small set of alternatives (e.g., coordinate clus-
tered indexes in different tables so that join predicates can be answered di-
rectly) and performing a few optimization calls, retaining the indexes of the
Search WWH ::




Custom Search