Databases Reference
In-Depth Information
Third, there is the “N 2 effect,” which will be discussed in more detail later. This
effect is a natural consequence of how joins are processed on a shared-nothing system,
which leads mathematically to an exponential gain in efficiency of join execution.
6.2 More Key Concepts and Terms
Shared-nothing architecture: The ability for several processors to parallelize a sin-
gle task by processing a subset of data where the data is partitioned and assigned
to nonoverlapping subsets of processors.
Massively parallel processing (MPP): A set of servers/nodes, communicating
using high-speed network, on which runs a shared-nothing solution.
Massively parallel processor (MPP): A generic term for a shared-nothing system,
more frequently used when the number of nodes exceeds eight, though more
casually used as a shorthand reference to any shared-nothing system.
Cluster: A multicomputer configuration in which nodes share a common disk
subsystem. When a node fails another node, having physical access to the disk
of the failed node can take over processing responsibilities for the failed node.
Scalability: The measure of the architecture to grow while still achieving positive
processing gains:
-
Scale-up: grow by adding components to a single node.
-
Scale-out: grow by adding more nodes.
Linearity: Linear scaling is the measure of efficiency. Are two nodes twice as
effective as one? 1.9 times as effective as one?
6.3 Hash Partitioning
Both of the major products that support shared nothing distribute records to the data-
base nodes using a hashing function. The hashing function performs a mathematical
transform on one or more columns in each record, hashing them to a numeric value.
Because the shared-nothing system can have different numbers of nodes, depending on
the particular configuration, the hash value can't be directly converted to a node num-
ber. Instead a mapping is required, which is usually based on a table lookup (not a “rela-
tional table” lookup, but more likely a lookup into an array). The lookup table is usually
referred to as a hash map or a partition map . It maps every possible hash value of a record
to a destination node. As a result, each time a new node is added to the MPP the hash
map will need to be recomputed. Vendors generally use a format for the hash map that
4 Chapter 13 has a brief introduction to the topics “cache coherence” and “false sharing.”
Search WWH ::




Custom Search