Extraction, Transformation, and Loading - Data Warehouse Systems: Design and Implementation

Database Reference

In-Depth Information

Fig. 8.43 Load of the Sales fact table

Integration Services, in the latter the sort data task includes the capability

to remove duplicates, which in Kettle requires two transformation steps.

Finally, Fig. 8.43 shows the transformation that loads the Sales fact table.

The corresponding flow for Integration Services is shown in Fig. 8.35 .The

flow starts by obtaining values from the following SQL query addressed to

the Northwind database, which is the same as in the conceptual design given

in Sect. 8.3 :

SELECT O.CustomerID, EmployeeID AS EmployeeKey,

O.OrderDate, O.RequiredDate, O.ShippedDate,

ShipVia AS ShipperKey, P.ProductID AS ProductKey,

P.SupplierID AS SupplierKey, O.OrderID AS OrderNo,

ROW NUMBER() OVER (PARTITION BY D.OrderID

ORDER BY D.ProductID) AS OrderLineNo,

D.UnitPrice, Quantity, Discount,

D.UnitPrice * (1-Discount) * Quantity AS SalesAmount,

O.Freight/COUNT(*) OVER (PARTITION BY D.OrderID) AS Freight

FROM Orders O, OrderDetails D, Products P

WHERE O.OrderID = D.OrderID AND D.ProductID = P.ProductID

In the corresponding query in Integration Services, it is possible to query

both the Northwind operational database and the Northwind data warehouse

in a single query, while this not possible natively in PostgreSQL. For this,

additional lookup steps are needed in Kettle for obtaining the surrogate keys.

As a consequence, an additional task is needed in Integration Services for

removing the records with null values for surrogate keys obtained from the

SQL query, while these records are automatically removed in the lookup steps

in Kettle.

8.7 Summary

In this chapter, we have presented a detailed study of ETL processes, a key

component of a data warehousing architecture. We have shown the usefulness

of producing a conceptual model of ETL processes, independent of any

implementation. In this way, deploying the model in different tools is possible.

Further, information can be shared and distributed in a language that can

Search WWH ::

Custom Search

Home