Database Reference
In-Depth Information
tasks. It can be populated via the Execute SQL task or a Data Flow task. Most of the
time, the data I need to build a table cache is local to the destination and contains data
from the destination, so I often use T-SQL in an Execute SQL task to populate it.
You can maintaining the table cache by using truncate-and-load. However, for lar-
ger sets of lookup data, you may wish to consider maintaining the table cache using in-
cremental load techniques. This may sound like overkill, but when you need to perform
a lookup against a billion-row table (it happens, trust me), the incremental approach
starts to make sense.
Cache Transformation and Cache Connection Manager
If you find you need to look up the same data in multiple Data Flow tasks, consider us-
ing the Cache transform along with the Cache connection manager. The Cache connec-
tion manager provides a memory-resident copy of the data supplied via the Cache
transform. The cache is loaded prior to the first Data Flow task that will consume the
lookup data, and the data can be consumed directly by a Lookup transform. Precaching
data in this manner supports lookups, but it also provides a way to “mark” sets of rows
for other considerations such as loading. Later in this chapter, we will explore late-ar-
riving data and discuss patterns for managing it. One way to manage the scenario of
data continuing to arrive after the load operation has started is to create a cache of
primary and foreign keys that represent completed transactions, and then join to those
keys in Data Flow tasks throughout the load process. Will you miss last-second data
loading in this way? Yes, you will. But your data will contain complete transactions.
One benefit of executing incremental loads with table caches is the ability to execute
the load each month, week, evening, or every five minutes; only complete transactions
that have arrived since the last load executed will be loaded.
If you find you need to use the same lookup data across many SSIS packages (or
that the cache is larger than the amount of available server RAM), the Cache connec-
tion manager can persist its contents to disk. The Cache connection manager makes use
of the new and improved RAW file format, a proprietary format for storing data direc-
tly from a Data Flow task to disk, completely bypassing connection managers. Reads
and writes are very fast as a result, and the new format persists column names and data
types.
Load Staging
Another pattern worth mentioning here is load staging. Consider the following scen-
ario: a data warehouse destination table is large and grows often. Since the destination
Search WWH ::




Custom Search