Shared-nothing Partitioning - Physical Database Design

Databases Reference

In-Depth Information

Third, there is the “N 2 effect,” which will be discussed in more detail later. This

effect is a natural consequence of how joins are processed on a shared-nothing system,

which leads mathematically to an exponential gain in efficiency of join execution.

6.2 More Key Concepts and Terms

•

Shared-nothing architecture: The ability for several processors to parallelize a sin-

gle task by processing a subset of data where the data is partitioned and assigned

to nonoverlapping subsets of processors.

•

Massively parallel processing (MPP): A set of servers/nodes, communicating

using high-speed network, on which runs a shared-nothing solution.

•

Massively parallel processor (MPP): A generic term for a shared-nothing system,

more frequently used when the number of nodes exceeds eight, though more

casually used as a shorthand reference to any shared-nothing system.

•

Cluster: A multicomputer configuration in which nodes share a common disk

subsystem. When a node fails another node, having physical access to the disk

of the failed node can take over processing responsibilities for the failed node.

•

Scalability: The measure of the architecture to grow while still achieving positive

processing gains:

-

Scale-up: grow by adding components to a single node.

-

Scale-out: grow by adding more nodes.

•

Linearity: Linear scaling is the measure of efficiency. Are two nodes twice as

effective as one? 1.9 times as effective as one?

6.3 Hash Partitioning

Both of the major products that support shared nothing distribute records to the data-

base nodes using a hashing function. The hashing function performs a mathematical

transform on one or more columns in each record, hashing them to a numeric value.

Because the shared-nothing system can have different numbers of nodes, depending on

the particular configuration, the hash value can't be directly converted to a node num-

ber. Instead a mapping is required, which is usually based on a table lookup (not a “rela-

tional table” lookup, but more likely a lookup into an array). The lookup table is usually

referred to as a hash map or a partition map . It maps every possible hash value of a record

to a destination node. As a result, each time a new node is added to the MPP the hash

map will need to be recomputed. Vendors generally use a format for the hash map that

4 Chapter 13 has a brief introduction to the topics “cache coherence” and “false sharing.”

Search WWH ::

Custom Search

Home