Database Reference
In-Depth Information
Typically, a lookup is immediately followed by an exclusive gateway with a
branching condition. For conciseness, we decided to use a shorthand for these
two tasks and replace this by two conditional flows. This is shown in Fig. 8.13 .
Table 8.1 defines the various ETL tasks and their annotations. The
annotations between brackets, as in [Group By] , are optional, while the
annotationssuxedwithanasterisk,asin Input* , can be repeated several
times. Finally, the annotations separated by a vertical bar, as in Table
Query ,
are exclusive, one of which must be provided. In Appendix A ,wegivethe
BPMN notation for these ETL tasks.
|
8.3 Conceptual Design of the Northwind ETL
Process
In this section, using the concepts explained in the previous ones, we
present a conceptual model of the ETLprocessthatloadstheNorthwind
data warehouse from the operational database and other sources. Later,
in Sects. 8.5 and 8.6 , we show how this model can be implemented in,
respectively, Microsoft Integration Services and Pentaho Kettle.
The operational data reside in a relational database, whose logical schema
is shown in Fig. 8.14 . These data must be mapped to a data warehouse, whose
schema is given in Fig. 8.15 . In addition to the operational database, some
other files are needed for loading the data warehouse. We next describe these
files, as well as the requirements of the process.
First, an Excel file called Time.xls contains the data needed for loading the
Time dimension table. The time interval of this file covers the dates contained
in the table Orders of the Northwind operational database.
We can see in Fig. 8.15 that in the Northwind data warehouse the
dimension tables Customer and Supplier share the geographic hierarchy
starting at the City level. Data for the hierarchy State
Continent
are loaded from an XML file called Territories.xml that begins as shown
in Fig. 8.16 a. A graphical representation of the schema of the XML file is
shown in Fig. 8.16 b. Here, rectangles represent XML elements, and rounded
rectangles represent XML attributes. The cardinalities of the relationships
are also indicated. Notice that type is an attribute of State that contains, for
example, the value state for Austria. However, for Belgium it contains the
value province (not shown in the figure). Notice also that EnglishStateName ,
RegionName ,and RegionCode are optional, as indicated by the cardinality
0..1 .
It is worth noting that the attribute Region of tables Customers and
Suppliers in the Northwind database contains in fact a state or province name
(e.g., Quebec ) or a state code (e.g., CA ). Similarly, the attribute Country
contains a country name (e.g., Canada ) or a country code (e.g., USA ). To
identify to which state or province a city belongs, a file called Cities.txt
Country
Search WWH ::




Custom Search