Declarative Query Processing in Relational Database Systems - Automated Physical Database Design and Tuning

Database Reference

In-Depth Information

depends on properties of both data (e.g., size and order) and the environment

(e.g., available memory). If we were to design a generic and robust approach

to calculate our answer, we would need to implement a high-level decision

procedure to pick which variant to use for each call. This procedure should

be able to estimate the expected running time of each alternative and avoid

using inapplicable algorithms (e.g., the hash-based approach countMatches2

when there is no available memory). The implementation of countMatches

would therefore be as follows:

int countMatches( int [] R, int [] S,

bool isRsorted, bool isSsorted, bool hasEnoughMemory) {

// 1- estimate the cost of each alternative, discarding

inapplicable ones

// 2- invoke the alternative that is expected to be the

most efficient

}

We have now designed a very ecient, generic, and robust framework to

compute the number of matches among elements of two arrays. While it is not

straightforward, the generic algorithm can potentially take advantage of the

physical layout of data, the availability of additional memory, and the relative

sizes of the input arrays. We can even claim that it introduces some (very

limited) “declarative” flavor into the language. If we repeat this exercise for

several other common operations, we would obtain a library that allows us to

manipulate and query arrays in different ways, by specifying what we want

rather than exactly how to obtain the answer.

A key problem with this approach, however, is the lack of flexibility and

composability, which appears even for the simple scenarios discussed next.

Suppose that we want to change the definition of countMatches so that it

considers that r in R and s in S match whenever |r − s| ≤ 100. In this case,

we would need to revise all algorithms and perhaps even to discard those

that make assumptions that no longer hold (e.g., the hash-based alternative

in countMatches2 can handle only equality matches). We could, in principle,

keep extending countMatches so that it progressively (1) considers arbitrary

types for R and S elements by using templates or generics, (2) abstracts the

matching function so that it can be specified as a parameter from the outside,

and (3) implements other alternatives that leverage special cases for perfor-

mance. As we see next, we will quickly run into severe limitations even in this

relatively simple scenario.

Suppose now that we want to compute matching pairs with a value greater

than, say, 500. In this case, we could preprocess the input arrays eliminating

all values smaller than or equal to 500 and use the resulting arrays as inputs

to the countMatches algorithm. However, we would already be specifying one

concrete way of obtaining results (first filter, then count matches). Alterna-

tively, if just a few elements are greater than 500, we could “push” the filter

Automated Physical Database Design and Tuning

Search WWH ::

Custom Search

Home