Database Reference
In-Depth Information
orderID
, orderDate
, customerID
, webID
FROM PDW_Example.dbo.FactSales
WHERE orderDate >= '2013-01-01'
AND orderDate < '2014-01-01';
Your OLE DB Source Editor should look similar to Figure 8-2 . Click Preview to
verify the results, then click OK to close the editor.
This code is simple, but it's doing something pretty important. By searching the
FactSales table on orderDate —the column specified as the partitioning key in
Listing 8-2 —SQL Server is able to perform partition elimination, which is important
for minimizing I/O contention. This provides a natural boundary for each data flow that
is both easy to understand and performs well. You can achieve a similar result even
without partitioning FactSales by performing a sequential seek on the clustered in-
dex, orderDate . But what if FactSales was clustered on just orderID instead?
You can apply the same principles and achieve good performance by searching for an
evenly distributed number of sequential rows in each data flow. For example, if
FactSales has 1,000,000 rows and we are using 10 data flows, each OLE DB
Source should search for 100,000 rows (i.e., orderID >= 1 and orderID <
100000; orderID >= 100000 and orderID < 200000; and so on).
These types of design considerations can have a significant impact on the overall per-
formance of your Integration Services package.
Tip Not familiar with partitioning? Table partitioning is particularly well suited for
large data warehouse environments and offers more than just the benefits briefly men-
tioned here. More information is available in the whitepaper, “Partitioned Table and In-
dex Strategies Using SQL Server 2008,” at http://msdn.microsoft.com/en-
us/library/dd578580.aspx .
The Data Transformation
As you may recall, the source table has a UNIQUEIDENTIFIER column that is stored
as a CHAR(38) column in PDW. In order to load this data, we will need to transform
the UNIQUEIDENTIFER to a Unicode string. To do this, drag the Data Conversion
Search WWH ::




Custom Search