Enumerating the Search Space - Automated Physical Database Design and Tuning

Database Reference

In-Depth Information

the query must also be updated as a side effect. Therefore, it is not true

anymore that adding more indexes would always reduce the expected cost

of a workload. We next briefly explain how the different components in the

top-down approach change when updates are present in the workload.

Choosing the initial configuration: When updates are present, the

initial configuration is not optimal anymore because indexes also need

to be updated, raising the overall workload cost. However, the resulting

configuration is still optimal for the SELECT component of each update

query, and we can use this fact to obtain a lower bound. Specifically, the

expected cost for the SELECT portion of the queries in the workload is

added to the cost of all the UPDATE shells under the base configuration

(which contains only indexes that must be present in any configura-

tion). We then obtain a cost that cannot be improved by any configura-

tion. This bound is not tight (i.e., there might be no configuration that

meets the lower bound) but offers valuable additional information to a

database administrator.

Evaluating the current configuration: Updates can be handled by the

traditional approach of separating an update query into an inner SELECT

and an UPDATE shell. Specifically, we calculate, for each index I and up-

date query q in the workload, the cost of updating I for each update

shell in the workload. Later, when evaluating a configuration, we first

consider the inner SELECT portion of an update query q and then add,

to the resulting partial cost, the update cost of all indexes in the con-

figuration over the table referenced by q .

Ranking candidate configurations: For workloads with updates, the

upper-bound on the cost of a transformed configuration can be negative

(sometimes the cost of removing an index can be outweighed by the

benefit of not having to update it). In those cases the penalty function

correctly chooses transformations with negative over positive cost upper

bounds but sometimes makes poor decisions comparing two transforma-

tions with negative costs. If

T t 1 =− 10,

S t 1 = 10,

T t 2 =− 20, and

S t 2 = 30, then penalty ( C t 1 ) =− 1 is smaller than penalty ( C t 2 ) =− 2 / 3.

However, configuration C t 2 is clearly better than C t 1 in terms of both

space and cost (we say that t 2 dominates t 1 ). To remedy those situa-

tions, we first obtain the skyline of candidate configurations (i.e., we

consider only candidate configurations that are not dominated by any

other transformation) and then use the original penalty definition over

this restricted subset. Also, note that the denominator in the definition

of penalty is given by min

. Since now we can relax

a configuration that requires less than B storage (see above), the de-

nominator might become negative, which is undesirable. However, in

those situations space is not relevant since the configuration already

fits in B . Therefore, when the current configuration requires less than

(

size

(

C

) −

B

,

S

)

Automated Physical Database Design and Tuning

Search WWH ::

Custom Search

Home