Database Reference
In-Depth Information
I'm certain that the gentleman who came up with Murphy's Law was working as a data
warehouse ETL developer. Dealing with data from disparate systems is often an ugly
process! Although we can defensively code around many common issues, eventually
some data anomaly will introduce unexpected errors.
To make sure that any error or other data anomaly does not cause the ETL process
to abruptly terminate, it's advisable to build in some safety nets to handle any unexpec-
ted errors.
Data Warehouse ETL Workflow
Most of what has been covered in this chapter so far has been core concepts about the
loading of data warehouses. I'd like to briefly change gears and touch on the topic of
SSIS package design with respect to workflow. Data warehouse ETL systems tend to
have a lot of moving parts, and the workload to develop those pieces is often distrib-
uted to multiple developers. Owing to a few lessons learned the hard way, I've deve-
loped a workflow design pattern of package atomicity.
Dividing Up the Work
I've told this story in a couple of presentations I've done, and it continues to be amus-
ing (to me, anyway) to think about ever having done things this way. The first produc-
tion SSIS package of any significance that I deployed was created to move a large
amount of data from multiple legacy systems into a new SQL Server database. It star-
ted off rather innocently; it initially appeared that the ETL logic would be much sim-
pler than what it eventually became, so I wrapped everything into a single package.
In the end, the resulting SSIS package was enormous. There were 30, maybe 40,
different data flows (some with multiple sources/destinations and complex transforma-
tion logic) and dozens of other helper tasks intermingled. The resulting .dtsx file size
was about 5MB just for the XML metadata! Needless to say, every time I opened this
package in Visual Studio, it would take several minutes to run through all of the valida-
tion steps.
This extremely large SSIS package worked fine, and technically there was nothing
wrong with the design. Its sheer size did bring to light some challenges that are present
in working with large, do-everything packages, and as a result of that experience, I
reengineered my methodology for atomic package design.
Search WWH ::




Custom Search