Database Reference
In-Depth Information
data is first loaded into a temporary table in the staging database before it is inserted in-
to the permanent table in the destination database.
Tip
Using a staging database? Make sure your staging database has enough space
available to accommodate all tables being loaded concurrently. If you do not allocate
enough space initially, don't worry; you'll still be okay—the staging database will
autogrow. Your loads may just slow down while the autogrow is occurring. Also, your
staging database will likely need to be larger when you perform the initial table loads
during system deployment and migration. However, once your system becomes more
mature and the initial ramp-up is complete, you can recover some space by dropping
and re-creating a smaller staging database.
From within SSDT, execute the code in
Listing 8-5
on your PDW appliance to cre-
ate a staging database.
Listing 8-5
.
PDW Code to Run from SSDT to Create a Staging Database
CREATE DATABASE StageDB_Example
WITH
(
AUTOGROW
= ON
, REPLICATED_SIZE
= 1 GB
, DISTRIBUTED_SIZE
= 5 GB
, LOG_SIZE
= 1 GB
);
PDW introduces the concept of replicated and distributed tables. In a
distributed
table
, the data is split across all nodes using a distribution hash specified during table
creation. In a
replicated table
, the full table data exist on every Compute node. When
used correctly, a replicated table can often improve join performance. As a hypothetical
example, consider a small
DimCountry
dimension table with 200 rows.
DimCoun-
try
would likely be replicated, whereas a much larger
FactSales
table would be
distributed. This design allows any joins between
FactSales
and
DimCountry
to
take place locally on each node. Although you would essentially be creating ten copies
of
DimCountry
, one on each Compute node, the performance benefit of a local join
outweighs the minimal cost of storing duplicate copies of such a small table.
Search WWH ::
Custom Search