Database Reference
In-Depth Information
rx
12
E
10
G
B
8
6
F
D
A
C
4
2
lx
2468 0
12
Figure 6.6
Shaded region representing an intersection query.
to a range query of finding all line segments whose
lx
5as
depicted in Figure 6.5. Similarly, a query that asks for all lines that overlap
line
F
≤
4
.
5 and
rx
≥
4
.
=
2,6
also translates to a range query of finding all lines whose
lx
≤
6
and
rx
2. The shaded region of Figure 6.6 is the area where the points
of the response to query lie. In the illustrative example given, all lines are
mapped onto points that lie above the 45-degree line. It is possible to choose
a mapping that distributes the points uniformly in the embedding space. The
basic idea is easily extended to objects of arbitrary shapes and dimensions
such as rectangles, circles, spheres, and so forth. For example, a collection of
rectangles may be represented either explicitly as rectangles with the appro-
priate index structure, such as R-trees
94
layered over it to process queries on
them, or mapped as points in a four-dimensional data space. In the subse-
quent discussion, we focus mostly on datasets that are
k
-dimensional and are
perceived as points in
k
-dimensional space.
A sample of a two-dimensional dataset is given in the table shown in
Figure 6.7. The first three columns give the Y-values, the X-values, and the
labels, respectively. The mapping of the dataset as points in a two-dimensional
space is shown in Figure 6.8 and is based on the Y- and X-values. The indexing
mechanism used to access the data is highly dependent on the physical orga-
nization of the datasets. In high-dimensional, very large, scientific datasets,
the datasets are typically vertically partitioned and stored by columns. Such
physical storage of datasets is amenable to ecient access by bitmaps and
compressed bitmap indexes.
4
,
21
,
38
,
48
,
102
,
103
Another popular physical represen-
tation of datasets, particularly when the datasets are perceived as points in a
k-dimensional data space, is by tessellating the space into rectangular regions
and then representing each region as a
chunk
of data so that all the points
in a region are clustered in the same
chunk
. A chunk typically corresponds
to the physical page size of disk storage but may span more than one page.
The different methods of tessellating and indexing the corresponding chunks
≥