Data Warehouse Patterns - SQL Server Integration Services Design Patterns

Database Reference

In-Depth Information

tasks. It can be populated via the Execute SQL task or a Data Flow task. Most of the

time, the data I need to build a table cache is local to the destination and contains data

from the destination, so I often use T-SQL in an Execute SQL task to populate it.

You can maintaining the table cache by using truncate-and-load. However, for lar-

ger sets of lookup data, you may wish to consider maintaining the table cache using in-

cremental load techniques. This may sound like overkill, but when you need to perform

a lookup against a billion-row table (it happens, trust me), the incremental approach

starts to make sense.

Cache Transformation and Cache Connection Manager

If you find you need to look up the same data in multiple Data Flow tasks, consider us-

ing the Cache transform along with the Cache connection manager. The Cache connec-

tion manager provides a memory-resident copy of the data supplied via the Cache

transform. The cache is loaded prior to the first Data Flow task that will consume the

lookup data, and the data can be consumed directly by a Lookup transform. Precaching

data in this manner supports lookups, but it also provides a way to “mark” sets of rows

for other considerations such as loading. Later in this chapter, we will explore late-ar-

riving data and discuss patterns for managing it. One way to manage the scenario of

data continuing to arrive after the load operation has started is to create a cache of

primary and foreign keys that represent completed transactions, and then join to those

keys in Data Flow tasks throughout the load process. Will you miss last-second data

loading in this way? Yes, you will. But your data will contain complete transactions.

One benefit of executing incremental loads with table caches is the ability to execute

the load each month, week, evening, or every five minutes; only complete transactions

that have arrived since the last load executed will be loaded.

If you find you need to use the same lookup data across many SSIS packages (or

that the cache is larger than the amount of available server RAM), the Cache connec-

tion manager can persist its contents to disk. The Cache connection manager makes use

of the new and improved RAW file format, a proprietary format for storing data direc-

tly from a Data Flow task to disk, completely bypassing connection managers. Reads

and writes are very fast as a result, and the new format persists column names and data

types.

Load Staging

Another pattern worth mentioning here is load staging. Consider the following scen-

ario: a data warehouse destination table is large and grows often. Since the destination

Search WWH ::

Custom Search

Home