Database Reference
In-Depth Information
rx
12
E
10
G B
8
6
F
D
A
C
4
2
lx
2468 0
12
Figure 6.6
Shaded region representing an intersection query.
to a range query of finding all line segments whose lx
5as
depicted in Figure 6.5. Similarly, a query that asks for all lines that overlap
line F
4
.
5 and rx
4
.
=
2,6
also translates to a range query of finding all lines whose lx
6
and rx
2. The shaded region of Figure 6.6 is the area where the points
of the response to query lie. In the illustrative example given, all lines are
mapped onto points that lie above the 45-degree line. It is possible to choose
a mapping that distributes the points uniformly in the embedding space. The
basic idea is easily extended to objects of arbitrary shapes and dimensions
such as rectangles, circles, spheres, and so forth. For example, a collection of
rectangles may be represented either explicitly as rectangles with the appro-
priate index structure, such as R-trees 94 layered over it to process queries on
them, or mapped as points in a four-dimensional data space. In the subse-
quent discussion, we focus mostly on datasets that are k -dimensional and are
perceived as points in k -dimensional space.
A sample of a two-dimensional dataset is given in the table shown in
Figure 6.7. The first three columns give the Y-values, the X-values, and the
labels, respectively. The mapping of the dataset as points in a two-dimensional
space is shown in Figure 6.8 and is based on the Y- and X-values. The indexing
mechanism used to access the data is highly dependent on the physical orga-
nization of the datasets. In high-dimensional, very large, scientific datasets,
the datasets are typically vertically partitioned and stored by columns. Such
physical storage of datasets is amenable to ecient access by bitmaps and
compressed bitmap indexes. 4 , 21 , 38 , 48 , 102 , 103 Another popular physical represen-
tation of datasets, particularly when the datasets are perceived as points in a
k-dimensional data space, is by tessellating the space into rectangular regions
and then representing each region as a chunk of data so that all the points
in a region are clustered in the same chunk . A chunk typically corresponds
to the physical page size of disk storage but may span more than one page.
The different methods of tessellating and indexing the corresponding chunks
Search WWH ::




Custom Search