Shared-nothing Partitioning - Physical Database Design

Databases Reference

In-Depth Information

ited data skew is a good design practice. There will be more discussion on best practices

for partitioning selection when we discuss skew and collocation later in the chapter.

6.4 Pros and Cons of Shared Nothing

In his short paper on shared nothing, Michael Stonebraker [1986] compared the bene-

fits and drawbacks of shared-nothing architectures, which he summarized in a table

comparing attributes. The table is reproduced here as Table 6.1. In this table attributes

are ranked 1, 2, or 3, where 1 is superior.

The main observations here are the improved bandwidth and scalability and the

reduced susceptibility to critical sections. In fact, Stonebraker's observations have borne

out in practice with the overwhelming majority of very large complex workload bench-

marks being achieved on shared-nothing architectures over the past five years. The cur-

rent industry standard for complex business intelligence processing is the TPC-H

benchmark. A quick review of the leading benchmark results for large-scale, multiter-

abyte processing shows that databases using shared-nothing architecture are heavily used

in this space. 6

However, conversely, MPP shared-nothing systems have suffered from some of the

other predictions that can largely be summarized as “complexity.” Shared-nothing sys-

tems are harder to design and manage than single-node servers. Some would argue that

the complexity is actually a critical limitation in the adoption potential of this architec-

ture as a mainstream platform. However, the ability to outperform combined with rapid

advances in self-managing systems (storage, servers, and databases) is shrinking these

concerns rapidly and broader adoption is likely.

For all its concerns and performance benefits, what really sets shared-nothing archi-

tectures apart is their impressive linearity and scale-out for complex business-intelligence

workloads. It's because of the impressive ability to scale well that shared-nothing systems

have achieved such impressive performance results. In one experiment reported by God-

dard [2005], users compared the performance of a single-server 24-way system with 0.5

TB to a two-node 1.0 TB system with identical hardware. The only difference between

the two systems was that in one case a single server held 0.5 TB of data on a single server,

and in the second case two identical servers each held 0.5 TB with data hash partitioned

to each server in a shared-nothing architecture. What the experimental data showed was

near-linear scalability for both the database build processing and the query execution per-

formance (Figure 6.4).

At first glance it may not seem obvious why shared nothing should scale quite so

well. While it's true that shared nothing doesn't suffer from the same physical inhibitors

6 Teradata has not participated in TPCH benchmarks in recent years, though their benchmarking

success with a shared-nothing architecture was very good during the years they published results.

Search WWH ::

Custom Search

Home