Databases Reference
In-Depth Information
required for the SQL statement. Fortunately, the majority of OLTP applications does
not need cluster scale-out and a single multiprocessor server (symmetric multiprocessor
or NUMA) can handle the vast majority of OLTP workloads. Larger SMPs come out
every year, with increasingly large RAM and number of CPUs (64- or 128-way sys-
tems), making it feasible in most cases to run most OLTP applications without multi-
server support to achieve scale.
6.6 Design Challenges: Skew and Join Collocation
At first glance the shared-nothing model looks very impressive. Each node operates on a
fragment of the data, and the results are merged in a trivial way, leading to amazing per-
formance scalability. There are challenges in getting the design right, and in fact in prac-
tice the scalability is rarely perfectly linear with the number of nodes.
Both commercial shared-nothing systems (DB2, Teradata) partition data across the
nodes using hashing techniques. Columns are selected by the database designer as the
“partitioning key” or “partitioning columns.” Hash partitioning is where the data is
assigned to a node by a hash function over the partitioning columns on each record.
With shared-nothing architecture the ultimate goal is to keep all the nodes busy
working in a linear fashion on the larger database problems, and also to minimize the
degree to which data must be shared between the nodes. Two serious problems pose a
major challenge for the database designer. These are data skew and poor collocation.
Each problem will be discussed below, describing why it is a problem and what to do
about it.
6.6.1 Data Skew
In order for shared-nothing partitioning to be effective, the data needs to be distrib-
uted evenly across the nodes. If one of the nodes has significantly more data than the
others, it will certainly take the longest to complete the processing, and the entire
computation time becomes limited by the slowest node in the system. Selecting the
partitioning columns such that the data is distributed evenly across all nodes is very
difficult. In the case of range partitioning, the process is nearly impossible, since each
data range will have a different number of records associated with it. For example,
consider a sales table that is range partitioned by date. You can expect December to
have considerably more sales data for retail products than most other months. Even
with a very fair hash function, a large density of data at a few data points can seriously
skew the data. For example, hash partitioning retail sales data by “week-of-year”
would likely result in a lot of data for the week before December 25 as people rush to
finish last-minute holiday shopping. Therefore, the nodes that these dates hash to
would have a disproportionate volume of data.
Search WWH ::




Custom Search