Database Reference
In-Depth Information
Typically, a lookup is immediately followed by an exclusive gateway with a
branching condition. For conciseness, we decided to use a shorthand for these
two tasks and replace this by two conditional flows. This is shown in Fig.
8.13
.
Table
8.1
defines the various ETL tasks and their annotations. The
annotations between brackets, as in
[Group By]
, are optional, while the
annotationssuxedwithanasterisk,asin
Input*
, can be repeated several
times. Finally, the annotations separated by a vertical bar, as in
Table
Query
,
are exclusive, one of which must be provided. In Appendix
A
,wegivethe
BPMN notation for these ETL tasks.
|
8.3 Conceptual Design of the Northwind ETL
Process
In this section, using the concepts explained in the previous ones, we
present a conceptual model of the ETLprocessthatloadstheNorthwind
data warehouse from the operational database and other sources. Later,
in Sects.
8.5
and
8.6
, we show how this model can be implemented in,
respectively, Microsoft Integration Services and Pentaho Kettle.
The operational data reside in a relational database, whose logical schema
is shown in Fig.
8.14
. These data must be mapped to a data warehouse, whose
schema is given in Fig.
8.15
. In addition to the operational database, some
other files are needed for loading the data warehouse. We next describe these
files, as well as the requirements of the process.
First, an Excel file called
Time.xls
contains the data needed for loading the
Time
dimension table. The time interval of this file covers the dates contained
in the table
Orders
of the Northwind operational database.
We can see in Fig.
8.15
that in the Northwind data warehouse the
dimension tables
Customer
and
Supplier
share the geographic hierarchy
starting at the
City
level. Data for the hierarchy
State
Continent
are loaded from an XML file called
Territories.xml
that begins as shown
in Fig.
8.16
a. A graphical representation of the schema of the XML file is
shown in Fig.
8.16
b. Here, rectangles represent XML elements, and rounded
rectangles represent XML attributes. The cardinalities of the relationships
are also indicated. Notice that
type
is an attribute of
State
that contains, for
example, the value
state
for Austria. However, for Belgium it contains the
value
province
(not shown in the figure). Notice also that
EnglishStateName
,
RegionName
,and
RegionCode
are optional, as indicated by the cardinality
0..1
.
It is worth noting that the attribute
Region
of tables
Customers
and
Suppliers
in the Northwind database contains in fact a state or province name
(e.g.,
Quebec
) or a state code (e.g.,
CA
). Similarly, the attribute
Country
contains a country name (e.g.,
Canada
) or a country code (e.g.,
USA
). To
identify to which state or province a city belongs, a file called
Cities.txt
→
Country
→
Search WWH ::
Custom Search