Database Reference
In-Depth Information
6.7.1 Data Processing Systems
To access data eciently, the underlying data must be organized in a suitable
manner, since the speed of query processing depends on the data organization.
In most cases, the data organization of a data processing system is inextrica-
bly linked to the system design. Therefore we cannot easily separate the data
organization issue from the systems that support them. Next, we review a few
example systems to see how their data organization affects the query process-
ing speed. Since most of the preceding discussion applies to the traditional
DBMS systems, we will not discuss them any further.
6.7.1.1
Column-Based Systems
The column-based systems are extensively discussed in Chapter 7. Here, we
will only mention some names and give a brief argument on their effectiveness.
There are a number of commercial database systems that organize their
data in column-oriented fashion, for example, Sybase IQ, Vertica, and Kx
Systems. 98 Among them, Kx Systems can be regarded as an array database
because it treats an array as a first-class citizen like an integer number. There
are a number of research systems that use vertical data organization as well,
for example, C-Store, 90 , 91 MonetDB, 16 , 17 and FastBit. One common feature of
all these systems is that they logically organize values of a column together.
This offers a number of advantages. For example, a typical query only involves
a small number of columns; the column-oriented data organization allows the
system to only access the columns involved, which minimizes the I/O time.
In addition, since the values in a column are of the same type, it is easier to
determine the location of each value and avoid accessing irrelevant rows. The
values in a column are more likely to be the same as values from different
columns in row-oriented data organization, which makes it more effective to
apply compression on data. 1
6.7.1.2
Special-Purpose Data Analysis Systems
Most of the scientific data formats such as FITS, NetCDF, and HDF5 come
with their own data access and analysis libraries, and can be considered as
special-purpose data analysis systems. By far the most developed of such
systems is ROOT. 18 , 19 , 70 ROOT is a data management system developed by
physicists originally for high-energy physics data. It currently manages many
petabytes of data around the world, more than many of the well-known com-
mercial DBMS products. ROOT uses an object-oriented metaphor for its data:
a unit of data is called an object or an event (of high-energy collision), which
corresponds to a row in a relational table. The records are grouped into files,
and the primary access method to records in a file is to iterate through them
with an iterator. Once an event is available to the user, all of its attributes
are available. This is essentially the row-oriented data access. In recent ver-
sions of ROOT, it is possible to split some attributes of an event to store
Search WWH ::




Custom Search