Database Reference
In-Depth Information
A key point of the approach we present here is the perception of ETL
processes as a combination of control and data tasks, where control tasks
orchestrate groups of tasks and data tasks detail how input data are
transformed and output data are produced. For example, the overall process
of populating a data warehouse is a control task composed of multiple
subtasks, while populating a fact or dimension table is a data task. Therefore,
control tasks can be considered as workflows where arrows represent the
precedence between tasks, while data tasks represent data flows where records
are transferred through the arrows. Given the discussion above, designing
ETL processes using business process modeling tools appears natural. We
present next the conceptual model for ETL processes based on BPMN.
Control tasks represent the orchestration of an ETL process, independently
of the data flowing through such process. Such tasks are represented by means
of the constructs described in Sect. 8.1 . For example, gateways are used to
control the sequence of activities in an ETL process. The most used types of
gateways in an ETL context are exclusive and parallel gateways. Events are
another type of objects often used in control tasks. For instance, a cancelation
event can be used to represent the situation when an error occurs and may
be followed by a message event that sends an e-mail to notify the failure.
Fig. 8.10 An excerpt of a control task
Figure 8.10 shows a portion of the control task that loads the Northwind
data warehouse. There are three subprocesses called Continent Country State
Load , TempCities Load ,and City Load , which load, respectively, the tables
composing the hierarchy State
Continent , a temporary table
TempCities ,andthetable City . The first two subprocesses are the incoming
flow of a parallel merging gateway. The outgoing flow of this gateway is the
input to the City Load . Note that the sequence flows outgoing the City Load
activity could also be modeled as a parallel splitting gateway.
Swimlanes can be used to organize ETL processes according to several
strategies, namely, by technical architecture (such as servers to which tasks
are assigned), by business entities (such as departments or branches), or
by user profiles (such as manager, analyst, or designer) that give special
access rights to users. For example, Fig. 8.8 illustrates the use of swimlanes
Country
Search WWH ::




Custom Search