Database Reference
In-Depth Information
column-oriented organization is also known as the vertical data organization.
There are many variations based on these two basic organizations. For exam-
ple, a large table is often horizontally split into partitions, where each partition
is then further organized horizontally or vertically. Since the organization of
a partition typically has more impact on query processing, our discussion will
center around how the partitions are organized. The data organization of a
system is typically fixed; therefore, to discuss data organization we cannot
avoid touching on different systems even though they have been discussed
elsewhere already. Most notably, Chapter 7 has extensive information about
systems with vertical data organizations.
This chapter primarily focuses on access methods and mostly on index-
ing techniques to speed up data accesses in query processing. Because these
methods can be implemented in software and have great potential of improv-
ing query performance, there have been extensive research activities on this
subject. To motivate our discussion, we review key characteristics of scientific
data and queries in the next section. In Section 6.3, we present a taxonomy of
index methods. In the following two sections, we review some well-known index
methods, with Section 6.4 on single-column indexing and Section 6.5 on mul-
tidimensional indexing. Given that scientific data are often high-dimensional
data, we present a type of index that has been demonstrated to work well
with this type of data. This type of index is the bitmap index; we devote Sec-
tion 6.6 to discussing the recent advances on the bitmap index. In Section 6.7
we revisit the data organization issue by examining a number of emerging
data processing systems with unusual data organizations. All these systems
do not yet use any indexing methods. We present a small test to demonstrate
that even such systems could benefit from an ecient indexing method.
6.2 Characteristics of Scientific Data
Scientific databases are massive datasets accumulated through scientific ex-
periments, observations, and computations. New and improved instrumenta-
tions now not only provide better data precision but also capture data at a
much faster rate, resulting in large volumes of data. Ever-increasing comput-
ing power is leading to ever-larger and more realistic computation simulations,
which also produce large volumes of data. Analysis of these massive datasets
by domain scientists often involves finding some specific data items that have
some characteristics of particular interest. Unlike the traditional information
management system (IMS), such as management of bank records in the 1970s
and 1980s where the database consisted of a few megabytes of records that
have a small number of attributes, scientific databases typically consist of
terabytes of data (or billions of records) that have hundreds of attributes. Sci-
entific databases are generally organized as datasets. Often these datasets are
Search WWH ::




Custom Search