Data Warehouse Patterns - SQL Server Integration Services Design Patterns

Database Reference

In-Depth Information

everything packages are quite inconvenient. Although it's much easi-

er in SQL Server 2012 than in previous releases to compare differen-

ces between versions of the same package file, it's still a mostly

manual process. By segmenting the workload into multiple packages,

it's much easier to farm out development tasks to multiple people

without having to reconcile multiple versions of the same package.

Reusability: It's not uncommon for the same logic to be used more

than once in the same ETL execution, or for that same logic to be

shared among multiple ETL processes. When you encapsulate these

logical units of work into their own packages, it's much easier to

share that logic and avoid duplicate development.

It is possible to go overboard here. For example, if, during the course of the ETL

execution, you need to drop or disable indexes on 20 tables, you probably don't need to

create a package for each index! Break operations up into individual packages, but be

realistic about what constitutes a logical unit of work.

These aren't hard and fast rules, but with respect to breaking up ETL work into

packages, here are a few design patterns that I've found work well when populating a

data warehouse:

• Each dimension has a separate package.

• Each fact table has a separate package.

• Staging operations (if used) each have their own package.

• Functional operations unrelated to data movement (for example, drop-

ping or disabling indexes on tables before loading them with data) are

separated as well. You can group some of these operations together in

common packages where appropriate; for example, if you truncate

tables and drop indexes in a staging database, those operations typically

reside in the same package.

Further, it's often a good practice to isolate in separate packages any ETL logic that

is significantly different in terms of scope or breadth of data. For example, a historical

ETL process that retrieves a large chunk of old data will likely have different perform-

ance expectations, error-handling rules, and so on than a more frequently executed

package that collects just the most recent incremental data. As such, by creating a sep-

arate package structure to address those larger sets of data, you help avoid the issue of

trying to force a single package to handle these disparate scenarios.

Search WWH ::

Custom Search

Home