Extraction, Transformation, and Loading - Data Warehouse Systems: Design and Implementation

Database Reference

In-Depth Information

if these surrogate keys are not found, null values are returned in the result.

Therefore, a conditional split transformation task selects the records obtained

in the previous query with a null value in the lookup columns and stores them

in a flat file. The correct records are loaded in the data warehouse.

Notice the difference of the above query with respect to the corresponding

query in the conceptual model given in Fig. 8.25 . While the above query

implements all the necessary lookups, in the conceptual design we have chosen

to implement the lookups in individual tasks, which conveys information

in a clearer way. Therefore, the conceptual design is more appropriate to

communicate the process steps within the project participants and also gives

us the flexibility to choose the implementation that is more appropriate for

the application needs.

8.6 The Northwind ETL Process in Kettle

We describe next a possible implementation in Kettle of the ETL process that

loads the Northwind data warehouse. In our implementation, the Northwind

operational database and the Northwind data warehouse are located on a

PostgreSQL database server.

Figure 8.36 shows the overall ETL process, in Kettle terminology, a job.

It is similar to the corresponding process for Integration Services shown in

Fig. 8.26 . Notice that we have not implemented a control flow for grouping

the three tasks that load the hierarchy composed of the Continent , Country ,

and State tables, although this could be done in Kettle using subjobs.

Figure 8.37 a shows the transformation (or flow) that loads the Category

dimension table. As can be seen, it is similar to the data flow in Integration

Services shown in Fig. 8.27 a. On the other hand, Fig. 8.37 bshowsthe

transformation that loads the Time dimension table. Compared with the

corresponding data flow shown in Fig. 8.27 b, Integration Services requires

an additional step for setting the appropriate data types for the attributes

read, while this is specified in the transformation step (or task) that reads

the CSV file in Kettle. 2

Transformations similar to the above are needed to load the Product and

Shipper tables. However, although loading the Employee table (which contains

a parent-child hierarchy) in Integration Services is also similar to the other

ones, this is not the case in Kettle, as Fig. 8.38 shows. In this figure, we can

see the transformation that loads the Employee dimension table. We have

seen in Chap. 7 that Mondrian requires a closure table that contains the

2 Note that in what follows we will refer indistinctly to transformations and steps

(in Kettle terminology) or flows and tasks (in Integration Services terminology),

respectively, since they represent analogous concepts.

Search WWH ::

Custom Search

Home