Data Warehouse Patterns - SQL Server Integration Services Design Patterns

Database Reference

In-Depth Information

Figure 11-10 . Adding the missing dimension member

After adding the missing member, we send those rows to a secondary ItemID

lookup, which will attempt (successfully, unless something goes terribly wrong) to

match the previously unmatched data with the newly added dimension records in the

DimItem table. It is important that you remember to set the cache mode to either Par-

tial Cache or No Cache when using a secondary lookup in this manner. The default

lookup cache setting (Full Cache) will buffer the contents of the Item dimension table

before the data flow is initiated, and as a result, none of the rows added during package

execution would be present in this secondary lookup. To prevent all of these redirected

fact rows from failing the secondary Item dimension lookup, use one of the non-de-

fault cache methods to force the package to perform an on-demand lookup to include

the newly added dimension values.

Regarding the secondary lookup transformation methodology, you might wonder if

the second lookup is even necessary. After all, if we perform the insert in the previous

step (OLE DB command), couldn't we just collect the new Item dimension key value

(a SQL Server table identity value, in most cases) using that SQL statement? The an-

swer is a qualified yes, and in my experience, that is the simpler of the two options.

However, I've also found that some ETL situations—in particular, the introduction of

parallel processes performing the same on-the-fly addition of dimension mem-

bers—can cloud the issue of collecting the identity value of the most recently inserted

record. From that perspective, I lean toward using the secondary lookup in cases such

as this.

Triage the Lookup Failures

For missing dimension records, the most common approaches typically involve one of

the preceding. However, on occasion it will be necessary to delay the processing of fact

records that do not match an existing dimension record. In cases such as this, you'll

need to create a triage table that will store the interim records until they can be success-

fully matched to the proper dimension.

As shown in Figure 11-11 , we're adding a couple of additional components to the

ETL pipeline for this design pattern. At the outset, we need to use two separate sources

of data: one to bring in the new data from the source system, and the other for reintro-

ducing previously triaged data into the pipeline. Further into the data flow, the example

shows that we are redirecting the unmatched fact records to another table rather than

trying to fix the data inline.

Search WWH ::

Custom Search

Home