Shared-nothing Partitioning - Physical Database Design

Databases Reference

In-Depth Information

required for the SQL statement. Fortunately, the majority of OLTP applications does

not need cluster scale-out and a single multiprocessor server (symmetric multiprocessor

or NUMA) can handle the vast majority of OLTP workloads. Larger SMPs come out

every year, with increasingly large RAM and number of CPUs (64- or 128-way sys-

tems), making it feasible in most cases to run most OLTP applications without multi-

server support to achieve scale.

6.6 Design Challenges: Skew and Join Collocation

At first glance the shared-nothing model looks very impressive. Each node operates on a

fragment of the data, and the results are merged in a trivial way, leading to amazing per-

formance scalability. There are challenges in getting the design right, and in fact in prac-

tice the scalability is rarely perfectly linear with the number of nodes.

Both commercial shared-nothing systems (DB2, Teradata) partition data across the

nodes using hashing techniques. Columns are selected by the database designer as the

“partitioning key” or “partitioning columns.” Hash partitioning is where the data is

assigned to a node by a hash function over the partitioning columns on each record.

With shared-nothing architecture the ultimate goal is to keep all the nodes busy

working in a linear fashion on the larger database problems, and also to minimize the

degree to which data must be shared between the nodes. Two serious problems pose a

major challenge for the database designer. These are data skew and poor collocation.

Each problem will be discussed below, describing why it is a problem and what to do

about it.

6.6.1 Data Skew

In order for shared-nothing partitioning to be effective, the data needs to be distrib-

uted evenly across the nodes. If one of the nodes has significantly more data than the

others, it will certainly take the longest to complete the processing, and the entire

computation time becomes limited by the slowest node in the system. Selecting the

partitioning columns such that the data is distributed evenly across all nodes is very

difficult. In the case of range partitioning, the process is nearly impossible, since each

data range will have a different number of records associated with it. For example,

consider a sales table that is range partitioned by date. You can expect December to

have considerably more sales data for retail products than most other months. Even

with a very fair hash function, a large density of data at a few data points can seriously

skew the data. For example, hash partitioning retail sales data by “week-of-year”

would likely result in a lot of data for the week before December 25 as people rush to

finish last-minute holiday shopping. Therefore, the nodes that these dates hash to

would have a disproportionate volume of data.

Search WWH ::

Custom Search

Home