Database Reference
In-Depth Information
Loading Data
We discussed in the previous section that PDW is able to query data very efficiently be-
cause of its shared-nothing architecture. For the same reason, PDW is also able to load
data very efficiently. Let's briefly discuss how PDW can perform data imports so effi-
ciently.
As previously mentioned, the Control VM is the first stop for anything being writ-
ten or retrieved from the appliance. The Control node determines which Compute
nodes will be involved in the storage operation. Each Compute node then uses a hash-
ing algorithm to determine where to store the data, down to the individual distribution
and associated LUNs. This allows each distribution to load its data in parallel with oth-
er distributions. Again, dividing and conquering a large table import in parallel will be
much faster than performing a single large import or performing several smaller im-
ports serially.
Data can be imported from numerous platforms, including from Oracle, SQL Serv-
er, MySQL, and flat files. There are two primary methods of loading data into the
PDW appliance: DWLoader and Integration Services. We will briefly discuss when to
use DWLoader vs. Integration Services. After that, we will walk through an example of
loading data from SQL Server using Integration Services.
DWLoader vs. Integration Services
DWLoader is a command-line utility that ships with PDW. Those familiar with SQL
Server BCP (bulk copy program) will have an easy time learning DWLoader, because
both utilities share a very similar syntax. One very common pattern for loading data in-
to PDW from SQL Server is to
1. Export data from SQL Server to a flat file using BCP.
2. Store the flat file on the loading server.
3. Import the data file from the loading server to PDW using DWLoad-
er.
This is a very efficient method for importing data, and it is very easy to generate
scripts for table DDL, BCP commands, and DWLoader commands. For this reason,
you may want to consider DWLoader for performing initial and incremental loading of
the large quantity of small dimensional tables that often exist in data warehouses. Do-
Search WWH ::




Custom Search