Database Reference
In-Depth Information
Other Query Processing Considerations: There are numerous
query processing conditions that influence the design and choice
of an index scheme besides the preceding ones. For example, the
orthogonal range query is very straightforward to specify by a set
of closed intervals. The range coverage may be circular, spherical,
or some arbitrary polygonal shape. There is a considerable body of
literature that covers these query classes. 79
Attribute Types: The data type of the attribute keys plays a significant
role in the selection of an index scheme used in an access method. For
example, when the data type is a long alphabetic string or bit-strings,
the selection of an index scheme may be different from that when the
data type of the index is an integer value.
Index Size Constraint: The general use of an index structure is to sup-
port fast access of data items from stored datasets. One desirable fea-
ture of an index is that its size be relatively small compared with the
base data. An index may contain relatively small data structures that
can fit entirely in memory during a running session of the application
that uses the index. The memory resident information is then used to
derive the relative positions or block addresses of the desired records.
Examples of these indexing schemes include inverted indexes, 47 bitmap
indexing, 21 , 103 and direct access index (or hashing 47 , 79 ). However, when
the index structure is suciently large, this will require it to be stored
on a disk storage system and then page-in the relevant buckets of the
index into memory on demand in a manner similar to the use of B-Tree
index and its variants. 24
Developing an ecient access method in scientific applications is one of the
first steps in implementing an ecient system for a data-intensive applica-
tion. The question that immediately arises then is what measures constitute a
good metric for evaluating an ecient index scheme. Let an index scheme G
be built to access N data items. For a query Q , the most important metrics
are the query response time and the memory requirement—how long and how
much memory does it take to answer the query. Often, the query response
time is dominated by I/O operations, and the number of bytes (or disk sec-
tors) read and written is sometimes used as a proxy for the query response
time.
Additional metrics for measuring an indexing scheme include recall denoted
by
ρ
, precision denoted by
π
, and storage utilization denoted by
μ(
G
)
. Let
r
(
Q
)
denote number of correct results retrieved for a query, and let R
(
Q
)
denote the actual number of results returned by a query in using G . Let c
(
Q
)
denote the actual number of correct results desired where c
(
Q
)
r
(
Q
)
. The precision is defined as the ratio of desired correct results to the
number of correct results, that is,
R
(
Q
)
; and the recall is the ratio
of the number of correct results to the number of retrieved results, that is,
π =
c
(
Q
)/
r
(
Q
)
ρ =
r
(
Q
)/
R
(
Q
)
. Let S
(
G
)
denote the actual storage used by an index scheme for
Search WWH ::




Custom Search