Databases Reference
In-Depth Information
ited data skew is a good design practice. There will be more discussion on best practices
for partitioning selection when we discuss skew and collocation later in the chapter.
6.4 Pros and Cons of Shared Nothing
In his short paper on shared nothing, Michael Stonebraker [1986] compared the bene-
fits and drawbacks of shared-nothing architectures, which he summarized in a table
comparing attributes. The table is reproduced here as Table 6.1. In this table attributes
are ranked 1, 2, or 3, where 1 is superior.
The main observations here are the improved bandwidth and scalability and the
reduced susceptibility to critical sections. In fact, Stonebraker's observations have borne
out in practice with the overwhelming majority of very large complex workload bench-
marks being achieved on shared-nothing architectures over the past five years. The cur-
rent industry standard for complex business intelligence processing is the TPC-H
benchmark. A quick review of the leading benchmark results for large-scale, multiter-
abyte processing shows that databases using shared-nothing architecture are heavily used
in this space. 6
However, conversely, MPP shared-nothing systems have suffered from some of the
other predictions that can largely be summarized as “complexity.” Shared-nothing sys-
tems are harder to design and manage than single-node servers. Some would argue that
the complexity is actually a critical limitation in the adoption potential of this architec-
ture as a mainstream platform. However, the ability to outperform combined with rapid
advances in self-managing systems (storage, servers, and databases) is shrinking these
concerns rapidly and broader adoption is likely.
For all its concerns and performance benefits, what really sets shared-nothing archi-
tectures apart is their impressive linearity and scale-out for complex business-intelligence
workloads. It's because of the impressive ability to scale well that shared-nothing systems
have achieved such impressive performance results. In one experiment reported by God-
dard [2005], users compared the performance of a single-server 24-way system with 0.5
TB to a two-node 1.0 TB system with identical hardware. The only difference between
the two systems was that in one case a single server held 0.5 TB of data on a single server,
and in the second case two identical servers each held 0.5 TB with data hash partitioned
to each server in a shared-nothing architecture. What the experimental data showed was
near-linear scalability for both the database build processing and the query execution per-
formance (Figure 6.4).
At first glance it may not seem obvious why shared nothing should scale quite so
well. While it's true that shared nothing doesn't suffer from the same physical inhibitors
6 Teradata has not participated in TPCH benchmarks in recent years, though their benchmarking
success with a shared-nothing architecture was very good during the years they published results.
Search WWH ::




Custom Search