Database Reference
In-Depth Information
for the Northwind ETL process (we will explain in detail this process later
in this chapter). The figure shows some of the subprocesses that load the
Product dimension table, the Time dimension table, and the Sales fact table
(represented as compound activities with subprocesses); it also assumes their
distribution between Server 1 and Server 2 . Each one of these servers is
considered as a lane contained inside the pool of data warehouse servers.
We can also see that a swimlane denoted Currency Server contains a web
service that receives an input currency (like US dollars), an amount, and an
output currency (like euros) and returns the amount equivalent in the output
currency. This could be used in the loading of the Sales fact table. Thus, flow
messages are exchanged between the Sales Load activity and the Exchange
Rate task which is performed by the web service. These messages go across
both swimlanes.
Data tasks represent activities typically carried out to manipulate data,
such as input data, output data, and data transformation. Since such
data manipulation operations occur within an activity, data tasks can be
considered as being at a lower abstraction level than control tasks. Recall that
arrows in a data task represent not only a precedence relationship between
its activities but also the flow of data records between them.
Data tasks can be classified into row and rowset operations. Row
operations apply transformations to the data on a row-by-row basis. In
contrast, rowset operations deal with a set of rows. For example, updating
the value of a column is a row operation, while aggregation is a rowset
operation. Data tasks can also be classified (orthogonally to the previous
classification) into unary or n -ary data tasks , depending of the number of
input flows.
a
b
c
d
Convert
Column
Input Data
Insert Data
Add Column
Column: SalesAmount =
D.UnitPrice * (1-Discount) *
Quanty
Columns:
Date: Date
DayNbWeek: Smallint
Database: NorthwindDW
Table: Time
Mappings:
TimeKey->OrderDateKey
Opons: Append
File: Time.xls
Type: Excel
Fig. 8.11 Unary row operations. ( a ) Input data. ( b ) Insert data. ( c )Addcolumn.
( d )Convertcolumn
Figure 8.11 shows examples of unary row operations: Input Data , Insert
Data , Add Column ,and Convert Column . Note the annotations linked to the
tasks by means of association flows. The annotations contain metadata that
specify the parameters of the task. For example, in Fig. 8.11 a, the annotation
tells that the data is read from an Excel file called Time.xls . Similarly, the
annotation in Fig. 8.11 b tells that the task inserts tuples in the table Time
Search WWH ::




Custom Search