Database Reference
In-Depth Information
ing so can greatly speed up a data warehouse migration. This same load pattern can
also be used with flat files generated from any system, not just SQL Server.
For your larger tables, you may instead want to consider Integration Services. In-
tegration Services offers greater functionality and arguably more end-to-end conveni-
ence. This is because Integration Services is able to connect directly to the data source
and load the data into the PDW appliance without having to stop at a file share. Anoth-
er important distinction is that Integration Services can also perform transformations in
flight, which DWLoader does not support.
It's worth noting that each data flow within Integration Services is single-threaded
and can bottleneck on I/O. Typically, a single-threaded Integration Services package
will perform up to ten times slower than DWLoader. However, a multithreaded Integra-
tion Services package—similar to the one we will create shortly—can mitigate that
limitation. For large tables requiring data type conversions, an Integration Services
package with ten parallel data flows provides the best of both worlds: similar perform-
ance to DWLoader and all the advanced functionality that Integration Services offers.
You should consider a number of variables when deciding whether to use
DWLoader or Integration Services. In addition to table size, both network speed and
table design can have an impact. At the end of the day, most PDW implementations
will likely use a combination of both tools. The best idea is to test the performance of
each method in your environment and use the tool that makes the most sense for each
table pattern.
ETL vs. ELT
Many Integration Services packages are designed using an extract, transform, and load
(ETL) process. This is a practical model that strives to lessen the impact of moving
data on the source and destination servers, which are traditionally more resource-con-
strained, by placing the burden of data filtering, cleansing, and other such activities on
the (arguably more easy-to-scale) ETL server. Extract, load, and transform (ELT) pro-
cesses, in contrast, place the burden on the destination server.
Although both models have their place and PDW supports both, ELT makes more
sense with PDW from both a technical and a business perspective. On the technical
side, PDW is able to utilize its massively parallel processing (MPP) power to more ef-
ficiently load and transform large volumes of data. From the business aspect, having
more data co-located allows more meaningful data to be gleaned during the transform-
ation process. Organizations with MPP systems often find that the ability to co-locate
and transform large quantities of disparate data enables them to make the leap from re-
Search WWH ::




Custom Search