Database Reference
In-Depth Information
of the
NorthwindDW
database, where column
TimeKey
in the flow is mapped
to the attribute
OrderDateKey
. Further, new records will be appended to
the table. The task in Fig.
8.11
c adds a column named
SalesAmount
to
the flow whose value is computed from the expression given. Here, it is
supposed that the values of the columns appearing in the expression are
taken from the current record. Finally, Fig.
8.11
d converts the columns
Date
and
DayNbWeek
(e.g., read from an Excel file as strings) into a
Date
and a
Smallint
, respectively.
a
b
c
Agreggate
Join
Union
Group By:
OrderNo
Columns:
Cnt=Count(*),
Condit ion:
Ipt
n
StateKey,
t
u*
:
CityName,
EmployeeID = Employeekey
oi
CountryKey
Jny :
Le Outer Join
Te
p
TotalSales=Sum(SalesAmount)
Ou
StateKey, CountryKey
Keep Du
pt
CityName,
u
:
pa
lc t
i
es
No
:
Fig. 8.12
Rowset operations. (
a
)
Aggregate
(unary). (
b
)
Join
(binary). (
c
)
Union
(
n
-ary)
Figure
8.12
shows three rowset operations:
Aggregate
(unary),
Join
(binary), and
Union
(
n
-ary). These operations receive a set of rows to
process altogether, rather than operating row by row. Again, annotations
complement the diagram information. For example, in the case of the
Union
task, the annotation tells the name of the input and output columns and
informs if duplicates must be kept. Note that the case of the union is a
particular one: if duplicates are retained, then it becomes a row operation
since it can be done row by row. If duplicates are eliminated, then it becomes
a rowset operation because sorting is involved in the operation.
Fig. 8.13
Shorthand notation for the lookup task
A very common data task in an ETL process is the lookup, which checks
if some value is present in a file, based on a single or compound search key.
Search WWH ::
Custom Search