Databases Reference
In-Depth Information
10.1.4 Counting for Shared-nothing Partitioning Design
As a reminder, shared-nothing partitioning is usually based on a hashing scheme that
hashes records to data nodes (see Figure 10.1). Real-world systems exist with as little as
two data nodes and as many as several hundred.
Figure 10.1
Hashing of records into a shared-nothing database.
For the database to perform well, it is critical that data be distributed across the
data nodes fairly evenly. Having a data node with less data than others isn't going to
cause massive problems, but having a partition with a disproportionately large amount
of data will usually mean that partition will become the slowest partition in the process-
ing. In a decision support (business intelligence) workload where multiple data nodes
collaborate to each solve a portion of the query and then merge results, a partition with
a disproportionately large volume of data will significantly slow the processing, becom-
ing a bottleneck. For a partitioned OLTP shared-nothing system, the partition with the
most data will receive a disproportionate amount of requests, becoming the slowest
node in the system and impacting the service level achievement for transactions being
processed by that data node. In short, data skew on a shared-nothing architecture is bad
news for any kind of workload. To avoid this it's important that the columns used to
Search WWH ::




Custom Search