Analyzing Data Streams in Scientific Applications - Scientific Data Management

Database Reference

In-Depth Information

1. The regions that are stored in the index are not fixed, but generated

by the Add and Grow operations. Due to the streaming nature of the

application, past decisions about growing and adding affect the set of

regions stored in the index and hence future performance.

2. Traditionally, the performance of index structures has been measured

in terms of the cost of search and in some cases update. Since the goal

of function approximation is to minimize total simulation cost, func-

tion evaluations and region operations must also be accounted for when

evaluating the performance of an index.

In the light of these observations a principled analysis of the various costs

in the function approximation algorithm leads to the discovery of novel trade-

offs. These tradeoffs produce significant and varying effects on different in-

dex structures. The remainder of this section briefly introduces the tradeoffs

and the tuning parameters that have been proposed to exploit them. The in-

dexing problem is studied here using the concrete instantiation of the ISAT

Algorithm using ellipsoidal regions with linear models. Therefore, regions are

often referred to as ellipsoids in the rest of the section. It is important to note,

however, that the ideas are applicable to other types of regions and associated

functions.

Tuning Retrieves. In most high-dimensional index structures the ellipsoid

containing a query point is usually not the first ellipsoid examined since an

index structure often performs fast approximations during the search before

performing an expensive check at each leaf level. The index ends up looking

at a number of ellipsoids before finding “the right one.” The additional ellip-

soids that are examined by the index are called false positives . For each false

positive, the algorithm pays to search and retrieve the ellipsoid from the index

and to check if the ellipsoid contains the query point. In traditional indexing

problems, if an object that satisfies the query condition exists in the index,

then finding this object during search is mandatory. Therefore, the number of

false positives is a fixed property of the index.

However, the function approximation problem provides the flexibility to

tune the number of false positives, because we can fall back to evaluating

the expensive function if the index search was not successful. The number of

false positives can be tuned by limiting the number of ellipsoids examined

during the retrieve step. This parameter is denoted by Ellr. Ellr places an

upper bound on the number of false positives for a query. Tuning Ellr controls

several effects.

Effect 1: Decreasing Ellr reduces the cost of the retrieve operation as

fewer ellipsoids are retrieved and examined.

Effect 2: Decreasing Ellr decreases the chance of finding an ellip-

soid containing the query point, thereby resulting in more function

evaluations.

Search WWH ::

Custom Search

Home