Database Reference
In-Depth Information
depends on properties of both data (e.g., size and order) and the environment
(e.g., available memory). If we were to design a generic and robust approach
to calculate our answer, we would need to implement a high-level decision
procedure to pick which variant to use for each call. This procedure should
be able to estimate the expected running time of each alternative and avoid
using inapplicable algorithms (e.g., the hash-based approach countMatches2
when there is no available memory). The implementation of countMatches
would therefore be as follows:
int countMatches( int [] R, int [] S,
bool isRsorted, bool isSsorted, bool hasEnoughMemory) {
// 1- estimate the cost of each alternative, discarding
inapplicable ones
// 2- invoke the alternative that is expected to be the
most efficient
}
We have now designed a very ecient, generic, and robust framework to
compute the number of matches among elements of two arrays. While it is not
straightforward, the generic algorithm can potentially take advantage of the
physical layout of data, the availability of additional memory, and the relative
sizes of the input arrays. We can even claim that it introduces some (very
limited) “declarative” flavor into the language. If we repeat this exercise for
several other common operations, we would obtain a library that allows us to
manipulate and query arrays in different ways, by specifying what we want
rather than exactly how to obtain the answer.
A key problem with this approach, however, is the lack of flexibility and
composability, which appears even for the simple scenarios discussed next.
Suppose that we want to change the definition of countMatches so that it
considers that r in R and s in S match whenever |r s| 100. In this case,
we would need to revise all algorithms and perhaps even to discard those
that make assumptions that no longer hold (e.g., the hash-based alternative
in countMatches2 can handle only equality matches). We could, in principle,
keep extending countMatches so that it progressively (1) considers arbitrary
types for R and S elements by using templates or generics, (2) abstracts the
matching function so that it can be specified as a parameter from the outside,
and (3) implements other alternatives that leverage special cases for perfor-
mance. As we see next, we will quickly run into severe limitations even in this
relatively simple scenario.
Suppose now that we want to compute matching pairs with a value greater
than, say, 500. In this case, we could preprocess the input arrays eliminating
all values smaller than or equal to 500 and use the resulting arrays as inputs
to the countMatches algorithm. However, we would already be specifying one
concrete way of obtaining results (first filter, then count matches). Alterna-
tively, if just a few elements are greater than 500, we could “push” the filter
Search WWH ::




Custom Search