Databases Reference
In-Depth Information
ware interconnect to implement its shared-nothing architecture. Netezza is another
company emerging that exploits shared-nothing architecture to produce a powerful
business intelligence offering that includes specialized hardware. Product discussion in
this chapter will focus on DB2 and Teradata, which are the dominant players today.
Chapter 14 includes additional discusson of Netezza.
Shared-nothing databases continue to dominate the industry for large complex data
sets, particularly where complex analysis of the data is required. Therefore, database
management system (DBMS) parallelization has focused on domain-specific research
such as data mining, decision support systems, geographical information systems (GIS),
and life sciences genomic/proteomic systems.
Using a shared-nothing architecture, several servers collaborate to solve single prob-
lems. Each server owns a fragment of the data, and has no access to the on-disk data on
any of the other servers (even database catalogs need to be transferred via communica-
tion interconnect). As a result, each server manages and operates on a distinct subset of
the database, using its own system resource to perform analysis of that fragment. After
each server has processed its fragment, the results from all of the servers are shipped
across a high-speed network or switch and merged by a designated server on the system.
Each server in this model is called a “node” or a “partition.” 2 Throughout this chapter,
to avoid confusion and bias, we will use the term “node” to represent a single server
within an MPP. The model is called “shared nothing” because the data is private to each
node, as are the caches, control block, and locks. The node performing the final coalesc-
ing of results from the other nodes is sometimes called the “coordinator” node. 3 The
coordinator may itself have a fragment of data that it operates on, or it may simply be
dedicated to the task of merging results and reporting them back to the client.
Each node operates as though it had a total world view of all the database data—
only the “coordinator” node has special processing that understands a deeper view. The
coordinator node, while aware of the other nodes, has no view of their activity or data
except through network communications.
To see how this improves the performance and scalability of database processing
consider the following simple aggregation query:
1 Informix was bought by IBM in 2001. Informix products are still actively sold and supported
by IBM.
2 IBM's DB2 uses the term “partition” and NCR Teradata uses the term “access module process
(AMP).”
3 DB2 uses the terminology of “coordinator partition,” which is defined as the node that the
application connects to perform a query/transaction. Any partition on the MPP can be a coordi-
nator at any moment in time if an application is connected to it. Teradata uses the term “parsing
engine (PE).” Parsing engines are dedicated nodes on the MPP. These can be logical nodes, as
described later.
Search WWH ::




Custom Search