Database Reference
In-Depth Information
These chunking methods give rise to typical formats for array files . Examples
of array files include the file formats known as NetCDF (network common data
format) 61 and HDF4/HDF5 (hierarchical data format). 42 We briefly explore
some of the widely used indexing methods that are based on tessellation of
the data space.
6.5.1 Multidimensional Tree-Structured Indexing Methods
From the general approach of designing multidimensional index schemes, we
examine now some special cases that lead to the class of tree-structured mul-
tidimensional indexing.
6.5.1.1 K-D-Tree and LSD-Tree
The K-D-Tree 11 , 25 , 33 was designed as a memory-resident data structure for
ecient processing of range, partial-match, and partial-range queries. Given
the k-dimensional data space of a collection of points, the K-D-Tree is derived
by tessellating the space with (k-1)-dimensional hyperplanes that are parallel
to all but the axis being split. The split plane occurs at the position that splits
the number of points as evenly as possible within the subspace being split.
The choice of which axis to split is done cyclically. The subspace tessellation
proceeds recursively until the number of points in a region forms the desired
chunk size of the data space. Figure 6.9a illustrates the idea for a 2D-space
tessellation that is represented explicitly as a tree structure in Figure 6.9b,
for a chunk size of 2. Such a K-D-Tree is referred to as a region K-D-Tree as
opposed to a point K-D-Tree. Rather than maintaining the leaf nodes, that
is, the data chunks, in memory, these can be maintained on disks while the
internal nodes of comparator values remain in memory. This variant of the
K-D-Tree is referred to as the LSD-Tree (or local split-decision tree) and is
more suitable for large disk-resident datasets. The idea is that when insertion
causes a data chunk to exceed its capacity, a decision can be made to split
the data chunk and to insert the split value as an internal node with pointers
to the newly created leaves of data chunks. The memory-resident tree nodes
can always be maintained persistently by ooading onto disk after a session
and reloading before a session.
The complexity of building a K-D-Tree of N points takes O
))
for a chunk size of B . Insertion and deletion of a new point into a K-
D-Tree takes O
(
N
/
B log 2 (
N
/
B
time. One key feature of the K-D-Tree is that
partial-match and partial-range queries involving s of k dimensions take
O
(
log
(
N
/
B
))
1
s
/
k
time to answer, where r is the number of the reported
points, and k the dimension of the K-D-Tree.
((
N
/
B
)
+
r
)
6.5.1.2 R-Tree and Its Variants
The R-Tree, first proposed by Guttman, 40 is an adaptation of the B + -Tree
to eciently index objects contained in k -dimensional bounding boxes. The
Search WWH ::




Custom Search