Database Reference
In-Depth Information
memory when the iterator points to the key/value pair. On structured data, for
a typical query that requires only a small number of attributes, a MapReduce
system is likely to deliver poorer performance than a parallel column-oriented
system such as MonetDB, C-Store, or Vertica. The MapReduce system is
proven effective for unstructured data.
6.7.1.4 Custom Data Processing Hardware
The speed of accessing secondary storage in the past few decades practically
remains unchanged compared with the increases in the speed of main memory
and CPU. For this reason, the primary bottleneck for ecient data processing
is often the disk. There have been a number of commercial efforts to build data
processing systems using custom hardware to more eciently answer queries.
Here we very briefly discuss two such systems: Netezza 26 and Teradata. 6 , 29
Netezza attempts to improve query processing speed by having smart disk
controllers that can filter data records as they are read off the physical media.
In a Netezza server, there is a front-end system that accepts the usual SQL
commands, so the user can continue to use the existing SQL code developed
for other DBMS systems. Inside the server, an SQL query is processed on a
number of different snippet processing units (SPUs), where each SPU has its
own disk and processing logic. The results from different SPUs are gathered by
the front-end host and presented to the user. In general, the idea of ooading
some data processing to the disk controllers to make an active storage system
could benefit many different applications. 56 , 73
The most unique feature of Teradata's warehousing system is the BYNET
interconnect that connects the main data access modules called AMPs (ac-
cess module processors). The design of BYNET allows bandwidth among the
AMPs to scale linearly with the number of AMPs (up to 4096 processors).
It also is fault tolerant and performs automatic load balancing. The early
versions of AMPs are similar to Netezza's “smart disk controllers;” however,
the current version of AMPs are software entities that utilize commodity disk
systems.
To the user, both Netezza and Teradata behave as a typical DBMS system,
which is a convenience feature for the user. On disks, both systems appear
to follow the traditional DBMS systems, that is, storing their data in the
row-oriented organization. Potentially, using the column-oriented organization
may improve their performances. Teradata has hash and B-Tree indexes, while
Netezza does not use any index method.
6.7.2 Indexes Still Useful
Many of the specialized data management systems mentioned above do not
employ any index method. When the analysis task calls for all or a large frac-
tion, say one-tenth, of the records in a dataset, then having an index may
not accelerate the overall data processing. However, there are plenty of cases
Search WWH ::




Custom Search