Counting and Data Sampling in Physical Design Exploration - Physical Database Design

Databases Reference

In-Depth Information

10.1.4 Counting for Shared-nothing Partitioning Design

As a reminder, shared-nothing partitioning is usually based on a hashing scheme that

hashes records to data nodes (see Figure 10.1). Real-world systems exist with as little as

two data nodes and as many as several hundred.

Figure 10.1

Hashing of records into a shared-nothing database.

For the database to perform well, it is critical that data be distributed across the

data nodes fairly evenly. Having a data node with less data than others isn't going to

cause massive problems, but having a partition with a disproportionately large amount

of data will usually mean that partition will become the slowest partition in the process-

ing. In a decision support (business intelligence) workload where multiple data nodes

collaborate to each solve a portion of the query and then merge results, a partition with

a disproportionately large volume of data will significantly slow the processing, becom-

ing a bottleneck. For a partitioned OLTP shared-nothing system, the partition with the

most data will receive a disproportionate amount of requests, becoming the slowest

node in the system and impacting the service level achievement for transactions being

processed by that data node. In short, data skew on a shared-nothing architecture is bad

news for any kind of workload. To avoid this it's important that the columns used to

Search WWH ::

Custom Search

Home