Database Reference
In-Depth Information
of the NorthwindDW database, where column TimeKey in the flow is mapped
to the attribute OrderDateKey . Further, new records will be appended to
the table. The task in Fig. 8.11 c adds a column named SalesAmount to
the flow whose value is computed from the expression given. Here, it is
supposed that the values of the columns appearing in the expression are
taken from the current record. Finally, Fig. 8.11 d converts the columns Date
and DayNbWeek (e.g., read from an Excel file as strings) into a Date and a
Smallint , respectively.
a
b
c
Agreggate
Join
Union
Group By: OrderNo
Columns: Cnt=Count(*),
Condit ion:
Ipt
n
StateKey,
t
u*
:
CityName,
EmployeeID = Employeekey
oi
CountryKey
Jny : Le Outer Join
Te
p
TotalSales=Sum(SalesAmount)
Ou
StateKey, CountryKey
Keep Du
pt CityName,
u :
pa
lc t
i
es No
:
Fig. 8.12 Rowset operations. ( a ) Aggregate (unary). ( b ) Join (binary). ( c ) Union
( n -ary)
Figure 8.12 shows three rowset operations: Aggregate (unary), Join
(binary), and Union ( n -ary). These operations receive a set of rows to
process altogether, rather than operating row by row. Again, annotations
complement the diagram information. For example, in the case of the Union
task, the annotation tells the name of the input and output columns and
informs if duplicates must be kept. Note that the case of the union is a
particular one: if duplicates are retained, then it becomes a row operation
since it can be done row by row. If duplicates are eliminated, then it becomes
a rowset operation because sorting is involved in the operation.
Fig. 8.13 Shorthand notation for the lookup task
A very common data task in an ETL process is the lookup, which checks
if some value is present in a file, based on a single or compound search key.
Search WWH ::




Custom Search