Database Reference
In-Depth Information
Chapter 8
Extraction, Transformation, and Loading
Extraction, transformation, and loading (ETL) processes are used to extract
data from internal and external sources of an organization, transform these
data, and load them into a data warehouse. Since ETL processes are complex
and costly, it is important to reduce their development and maintenance costs.
Modeling ETL processes at a conceptual level is a way to achieve this goal.
However, existing ETL tools, like Microsoft Integration Services or Pentaho
Data Integration (also known as Kettle), have their own specific language to
define ETL processes. Further, there is no agreed-upon conceptual model to
specify such processes. In this chapter, we study the design of ETL processes
using a conceptual approach. The model we use is based on the Business
Process Modeling Notation (BPMN), a de facto standard for specifying
business processes. The model provides a set of primitives that cover the
requirements of frequently used ETL processes. Since BPMN is already used
for specifying business processes, users already familiar with BPMN do not
need to learn another language for defining ETL processes. Further, BPMN
provides a conceptual and implementation-independent specification of such
processes, which hides technical details and allows users and designers to
focus on essential characteristics of such processes. Finally, ETL processes
expressed in BPMN can be translated into executable specifications for
ETL tools.
We start this chapter with a brief introduction of BPMN, which we give in
Sect. 8.1 . Then, in Sect. 8.2 ,weexplainhowwecanuseBPMNforconceptual
modeling of ETL processes. In Sect. 8.3 , we apply these concepts to the
Northwind case study. We design a conceptual model for the ETL process
that loads the Northwind data warehouse used in the previous chapters with
data extracted from the Northwind operational database and other sources.
Finally, after providing in Sect. 8.4 a brief overview of Microsoft Integration
Services and Pentaho Kettle, we show in Sects. 8.5 and 8.6 how the ETL
conceptual model can be implemented in both tools. A detailed specification
of the process is provided, and the differences between both implementation
platforms are highlighted.
Search WWH ::




Custom Search