Database Reference
In-Depth Information
Fig. 8.43 Load of the Sales fact table
Integration Services, in the latter the sort data task includes the capability
to remove duplicates, which in Kettle requires two transformation steps.
Finally, Fig. 8.43 shows the transformation that loads the Sales fact table.
The corresponding flow for Integration Services is shown in Fig. 8.35 .The
flow starts by obtaining values from the following SQL query addressed to
the Northwind database, which is the same as in the conceptual design given
in Sect. 8.3 :
SELECT O.CustomerID, EmployeeID AS EmployeeKey,
O.OrderDate, O.RequiredDate, O.ShippedDate,
ShipVia AS ShipperKey, P.ProductID AS ProductKey,
P.SupplierID AS SupplierKey, O.OrderID AS OrderNo,
ROW NUMBER() OVER (PARTITION BY D.OrderID
ORDER BY D.ProductID) AS OrderLineNo,
D.UnitPrice, Quantity, Discount,
D.UnitPrice * (1-Discount) * Quantity AS SalesAmount,
O.Freight/COUNT(*) OVER (PARTITION BY D.OrderID) AS Freight
FROM Orders O, OrderDetails D, Products P
WHERE O.OrderID = D.OrderID AND D.ProductID = P.ProductID
In the corresponding query in Integration Services, it is possible to query
both the Northwind operational database and the Northwind data warehouse
in a single query, while this not possible natively in PostgreSQL. For this,
additional lookup steps are needed in Kettle for obtaining the surrogate keys.
As a consequence, an additional task is needed in Integration Services for
removing the records with null values for surrogate keys obtained from the
SQL query, while these records are automatically removed in the lookup steps
in Kettle.
8.7 Summary
In this chapter, we have presented a detailed study of ETL processes, a key
component of a data warehousing architecture. We have shown the usefulness
of producing a conceptual model of ETL processes, independent of any
implementation. In this way, deploying the model in different tools is possible.
Further, information can be shared and distributed in a language that can
Search WWH ::




Custom Search