Database Reference
In-Depth Information
if these surrogate keys are not found, null values are returned in the result.
Therefore, a conditional split transformation task selects the records obtained
in the previous query with a null value in the lookup columns and stores them
in a flat file. The correct records are loaded in the data warehouse.
Notice the difference of the above query with respect to the corresponding
query in the conceptual model given in Fig. 8.25 . While the above query
implements all the necessary lookups, in the conceptual design we have chosen
to implement the lookups in individual tasks, which conveys information
in a clearer way. Therefore, the conceptual design is more appropriate to
communicate the process steps within the project participants and also gives
us the flexibility to choose the implementation that is more appropriate for
the application needs.
8.6 The Northwind ETL Process in Kettle
We describe next a possible implementation in Kettle of the ETL process that
loads the Northwind data warehouse. In our implementation, the Northwind
operational database and the Northwind data warehouse are located on a
PostgreSQL database server.
Figure 8.36 shows the overall ETL process, in Kettle terminology, a job.
It is similar to the corresponding process for Integration Services shown in
Fig. 8.26 . Notice that we have not implemented a control flow for grouping
the three tasks that load the hierarchy composed of the Continent , Country ,
and State tables, although this could be done in Kettle using subjobs.
Figure 8.37 a shows the transformation (or flow) that loads the Category
dimension table. As can be seen, it is similar to the data flow in Integration
Services shown in Fig. 8.27 a. On the other hand, Fig. 8.37 bshowsthe
transformation that loads the Time dimension table. Compared with the
corresponding data flow shown in Fig. 8.27 b, Integration Services requires
an additional step for setting the appropriate data types for the attributes
read, while this is specified in the transformation step (or task) that reads
the CSV file in Kettle. 2
Transformations similar to the above are needed to load the Product and
Shipper tables. However, although loading the Employee table (which contains
a parent-child hierarchy) in Integration Services is also similar to the other
ones, this is not the case in Kettle, as Fig. 8.38 shows. In this figure, we can
see the transformation that loads the Employee dimension table. We have
seen in Chap. 7 that Mondrian requires a closure table that contains the
2 Note that in what follows we will refer indistinctly to transformations and steps
(in Kettle terminology) or flows and tasks (in Integration Services terminology),
respectively, since they represent analogous concepts.
Search WWH ::




Custom Search