Database Reference
In-Depth Information
Typical Uses
CDC represents a shift in the incremental load methodology. The other methods de-
scribed here apply a downstream approach to incremental loading, with a minimally re-
strictive extraction from the source and a decision point late in the ETL flow. CDC, on
the other hand, processes the change logic further upstream, which can help lighten the
load on SSIS and other moving parts in the ETL.
If CDC is in place (or could be implemented) in a source system, it's certainly
worth considering using this design pattern to process incremental loads. It can perform
well, can reduce network loads due to processing fewer rows from the source, and re-
quires fewer resources on the ETL side. It's not appropriate for every situation, but
CDC can be an excellent way to manage incremental loads in SQL Server Integration
Services.
Keep in mind that the use of CDC as a design pattern isn't strictly limited to Mi-
crosoft SQL Server databases. CDC may also be leveraged against CDC-enabled
Oracle database servers.
Data Errors
Henry Wadsworth Longfellow once wrote, “Into each life some rain must fall.” The
world of ETL is no different, except that rain comes in the form of errors, often as a
result of missing or invalid data. We don't always know when they're going to occur.
However, given enough time, something is going to go wrong: late-arriving dimension
members, packages executed out of order, or just plain old bad data. The good news is
that there are data warehousing design patterns that can help mitigate the risk of data
anomalies that interrupt the execution of Integration Services packages.
To address patterns of handling missing data, we're going to concentrate mostly on
missing dimension members, as this is the most frequent cause of such errors.
However, you can extend some of the same patterns to other elements that are part of
or peripheral to data warehousing.
Simple Errors
The vast majority of errors can and should be handled inline, or simply prevented be-
fore they occur. Consider the common case of data truncation: you have a character
type field that's expected to contain no more than 50 characters, so you set the data
Search WWH ::




Custom Search