Database Reference
In-Depth Information
everything packages are quite inconvenient. Although it's much easi-
er in SQL Server 2012 than in previous releases to compare differen-
ces between versions of the same package file, it's still a mostly
manual process. By segmenting the workload into multiple packages,
it's much easier to farm out development tasks to multiple people
without having to reconcile multiple versions of the same package.
Reusability: It's not uncommon for the same logic to be used more
than once in the same ETL execution, or for that same logic to be
shared among multiple ETL processes. When you encapsulate these
logical units of work into their own packages, it's much easier to
share that logic and avoid duplicate development.
It is possible to go overboard here. For example, if, during the course of the ETL
execution, you need to drop or disable indexes on 20 tables, you probably don't need to
create a package for each index! Break operations up into individual packages, but be
realistic about what constitutes a logical unit of work.
These aren't hard and fast rules, but with respect to breaking up ETL work into
packages, here are a few design patterns that I've found work well when populating a
data warehouse:
• Each dimension has a separate package.
• Each fact table has a separate package.
• Staging operations (if used) each have their own package.
• Functional operations unrelated to data movement (for example, drop-
ping or disabling indexes on tables before loading them with data) are
separated as well. You can group some of these operations together in
common packages where appropriate; for example, if you truncate
tables and drop indexes in a staging database, those operations typically
reside in the same package.
Further, it's often a good practice to isolate in separate packages any ETL logic that
is significantly different in terms of scope or breadth of data. For example, a historical
ETL process that retrieves a large chunk of old data will likely have different perform-
ance expectations, error-handling rules, and so on than a more frequently executed
package that collects just the most recent incremental data. As such, by creating a sep-
arate package structure to address those larger sets of data, you help avoid the issue of
trying to force a single package to handle these disparate scenarios.
Search WWH ::




Custom Search