Database Reference
In-Depth Information
Figure 11-10 . Adding the missing dimension member
After adding the missing member, we send those rows to a secondary ItemID
lookup, which will attempt (successfully, unless something goes terribly wrong) to
match the previously unmatched data with the newly added dimension records in the
DimItem table. It is important that you remember to set the cache mode to either Par-
tial Cache or No Cache when using a secondary lookup in this manner. The default
lookup cache setting (Full Cache) will buffer the contents of the Item dimension table
before the data flow is initiated, and as a result, none of the rows added during package
execution would be present in this secondary lookup. To prevent all of these redirected
fact rows from failing the secondary Item dimension lookup, use one of the non-de-
fault cache methods to force the package to perform an on-demand lookup to include
the newly added dimension values.
Regarding the secondary lookup transformation methodology, you might wonder if
the second lookup is even necessary. After all, if we perform the insert in the previous
step (OLE DB command), couldn't we just collect the new Item dimension key value
(a SQL Server table identity value, in most cases) using that SQL statement? The an-
swer is a qualified yes, and in my experience, that is the simpler of the two options.
However, I've also found that some ETL situations—in particular, the introduction of
parallel processes performing the same on-the-fly addition of dimension mem-
bers—can cloud the issue of collecting the identity value of the most recently inserted
record. From that perspective, I lean toward using the secondary lookup in cases such
as this.
Triage the Lookup Failures
For missing dimension records, the most common approaches typically involve one of
the preceding. However, on occasion it will be necessary to delay the processing of fact
records that do not match an existing dimension record. In cases such as this, you'll
need to create a triage table that will store the interim records until they can be success-
fully matched to the proper dimension.
As shown in Figure 11-11 , we're adding a couple of additional components to the
ETL pipeline for this design pattern. At the outset, we need to use two separate sources
of data: one to bring in the new data from the source system, and the other for reintro-
ducing previously triaged data into the pipeline. Further into the data flow, the example
shows that we are redirecting the unmatched fact records to another table rather than
trying to fix the data inline.
 
Search WWH ::




Custom Search