Database Reference
In-Depth Information
ON t.object_id =
tp.object_id
The abridged results for this query are shown in Figure 10.6 . Notice that
the TableDistributionPolicyDesc column states HASH as it is a
distributed table and that the SalesOrderNumber row has a 1 for the
IsDistributionColumn .
Figure 10.6 Extended table and column properties for distributed tables
This query works just as well for showing off additional data about
replicated tables too. Let's look at them now.
Replicated Tables
Replicated tables tend to be smaller tables, typically dimensions, which
represent whole copies of data. A replicated table has its entire data set
copied to all the compute nodes. Why do this? Well, it really helps when
reading the data. At the end of the day, each compute node is a highly tuned
SQL Server SMP instance with one buffer pool of memory. To facilitate a
join, PDW needs to ensure that all the data is available. By replicating a
table,weguaranteethatthistableatleasthasallitsdataavailableforalocal,
co-located join.
However, if the join was between two distributed tables, we may have an
issue. It's still possible, of course, that both tables may be distributed in
the same way; this is often a primary design goal. However, it's not always
possible. Under some circumstances, one of the tables may have to be
redistributed to make the join compatible. The worst case is realized when
either table is distributed on a joining key. In this case both tables need to
be redistributed. This is called a double shuffle.
It is important to note that data movement can even happen with a
replicated table. When the join is an outer join, this triggers movement
because we have to handle the nulls generated by the join.
 
 
Search WWH ::




Custom Search