Database Reference
In-Depth Information
sisted there. Rather, the Control node directs the loading and retrieval of user data to
the appropriate Compute node and distribution. This is the first time we've introduced
the term distribution , so if you're not yet familiar with the term, don't worry. We'll
cover distributions in the next section.
Shared-Nothing Architecture
At the core of PDW is the concept of shared-nothing architecture . In a shared-nothing
architecture, a single logical table is broken up into numerous smaller physical pieces.
The exact number of pieces depends on the number of Compute nodes in the PDW re-
gion. Within a single Compute node, each data piece is then split across eight (8) distri-
butions. The number of distributions per Compute node cannot be configured and is
consistent across all hardware vendors.
A distribution is the most granular physical level within PDW. Each distribution
contains its own dedicated CPU, memory, and storage (LUNs), which it uses to store
and retrieve data. Because each distribution contains its own dedicated hardware, it can
perform load and retrieval operations in parallel with other distributions. This concept
is what we mean by “shared-nothing.” There are numerous benefits that a shared-noth-
ing architecture enables, such as more linear scalability. But perhaps PDW's greatest
power is its ability to scan data at incredible speeds.
Let's do some math. Assume you have a PDW appliance with a base rack contain-
ing 9 Compute nodes, and you need to store a table with 1 billion rows. The data will
be split across all 9 Compute nodes, and each Compute node will split its data across 8
distributions. Thus, the 1-billion-row table will be split into 72 distributions (9 Com-
pute nodes × 8 distributions per Compute node). That means each distribution will
store roughly 13,900,000 rows.
But what does this mean from the end user's standpoint? Let's look at a hypothetic-
al situation. You are a user at a retail company and you have a query that joins two
tables together: a Sales table with 1 billion rows, and a Customer table with 50 million
rows. And, as luck would have it, there are no indexes available that will cover your
query. This means you will need to scan , or read, every row in each table.
In an SMP system—where memory, storage, and CPU are shared—this query could
take hours or days to run. On some systems, it might not even be feasible to attempt
this query, depending on factors such as the server hardware and the amount of activity
on the server. Suffice it to say, the query will take a considerable amount of time to re-
turn and will most likely have a negative impact on other activity on the server.
Search WWH ::




Custom Search