Database Reference
In-Depth Information
￿ Pan , a stand-alone command line tool for executing transformations. Pan
reads data from and writes data to various data sources and is used also
to manipulate data.
￿ Kitchen , a stand-alone command line tool for executing jobs. Jobs are
usually scheduled to run in batch mode at regular intervals.
￿ Carte , a lightweight server for running jobs and transformations on
a remote host. It provides similar execution capabilities as the Data
Integration Server but does not provide scheduling, security, and content
management facilities.
We illustrate these concepts in Sect. 8.6 when we study the implementation
of the Northwind ETL process in Kettle.
8.5 The Northwind ETL Process in Integration
Services
In this section, we show an implementation in Integration Services of the
ETL process that loads the Northwind data warehouse. We compare this
implementation with the conceptual design presented in Sect. 8.3 and show
how to translate the constructs of the conceptual ETL language into the
equivalent ones in Integration Services. In our implementation, the Northwind
operational database and the Northwind data warehouse are located on an
SQL Server database.
Figure 8.26 shows the overall ETL process. It is composed of one sequence
container task (the one with the blue arrow in the left) and eleven data flow
tasks. All of these tasks are connected by precedence constraints, represented
by green arrows. The reader can compare this representation with the one
in Fig. 8.18 . Note that gateways are not present, but the semantics of the
corresponding arrows is quite similar: no task can start until all precedent
tasks have finished.
Several data flow tasks are simple. For example, the task that loads the
Category table is given in Fig. 8.27 a (compare with Fig. 8.19 ), where the data
flow tasks are an OLE DB Source task that reads the entire table from the
operational database and an OLE DB Destination task that receives the
output from the previous task and stores it in the data warehouse. Similar
data flows are used for loading the Product , Shipper ,and Employee tables.
Another straightforward task is the data flow that loads the Time
dimension table, shown in Fig. 8.27 b. After loading the source Excel file,
a data conversion transformation is needed to convert the data types from
the Excel file into the data types of the database. We can also see that this is
very similar to the conceptual specification depicted in Fig. 8.20 , except that
the addition of the surrogate key column is implicit in the Time Load task.
We further explain this next.
Search WWH ::




Custom Search