Database Reference
In-Depth Information
In the preceding code listing you can see that a lot is familiar to you.
However, you can see I have highlighted the key difference: DISTRIBUTION
= HASH([SalesOrderNumber]) . This means that every
SalesOrderNumber will be hashed and allocated to one of our buckets. It
is therefore essential that we pick a value with a good number of distinct
values (my rule of thumb is 1000 ) to make sure the data is spread evenly
across the appliance and avoids something we call data skew .
Thereisactuallyoneconsiderationthatwouldtakeprecedenceoverthedata
skew rule. When large fact tables are joined together (i.e., with a fact-to-fact
join), we can often see significant volumes of data being moved across the
appliance. This occurs when the two facts are not distributed on the same
column. In these situations it is vital that we try to prevent the DMS from
moving data to resolve the join. We can achieve this by distributing both
fact tables on the same column, ensuring that this column is also used in
the join—even if the column is not absolutely required to satisfy the join
criteria. The presence of a shared distribution key in the join will prevent
a DMS movement of data, which will have a positively dramatic impact on
performance. When faced with the prospect of large fact-to-fact joins, data
skew is a secondary consideration.
You might be wondering why this table doesn't specify which filegroup
it is on. It's a great question. Well, the answer is simply that this table
definition doesn't really exist on the compute nodes. PDW does create this
table on the control node inside the shell database, though, as you might
have suspected. However, after that, PDW has to manage creating one copy
of this table for every distribution. Because we have eight distributions per
compute node, we need eight different names for this table. The naming
convention is similar to the one used for databases
TABLE_32AlphanumerChars_[A-H] . You can see these table mappings
via the sys.pdw_table_mappings view:
select t.name
, ptm.physical_name
from sys.tables t
join sys.pdw_table_mappings ptm on t.object_id =
ptm.object_id
where t.name = 'FactInternetSales'
Figure 10.5 shows the result.
 
Search WWH ::




Custom Search