Database Reference
In-Depth Information
if not all multidimensional elements can be fed with data from the source
systems, a new iteration with the users to modify their requirements
according to the availability of data is required. As a result, a new schema
should be developed and presented to the users for acceptance. The changes
to the schema may require modification of existing mappings.
10.4.2 Analysis-Driven Conceptual Design for the
Northwind Case Study
Develop Initial Schema
Based on the users' requirements, we developed the initial conceptual diagram
shown in Fig.
10.7
. This diagram was presented in Chap.
4
,andwerepeatit
here to facilitate the presentation.
As described in the requirements phase, the main focus of analysis pertains
to sales amount figures. This is represented at the conceptual level by the
Sales
fact in Fig.
10.7
. Given that the source data are organized into orders,
we need to transform order data into sales facts during the ETL process. For
example, the schema in Fig.
10.7
includes the measures
Quantity
,
UnitPrice
,
Discount
,
SalesAmount
,
Freight
,and
NetAmount
. The first three measures are
obtained directly from the sources, while the others must be computed during
the ETL process.
SalesAmount
will be computed from
Quantity
,
UnitPrice
,
and
Discount
. On the other hand, since in the operational database
Freight
is associated with a complete order rather than with an order line, in the
data warehouse it must be distributed proportionally across the articles in
the corresponding order. Finally,
NetAmount
is a derived measure, computed
over the data cube. We remark the difference between a measure that is
computed during the ETL process (
SalesAmount
) and a derived attribute,
which is computed from the data cube (
NetAmount
). In addition, we specify
the aggregate function to be applied to each measure. For example, average
is applied to the measures
UnitPrice
and
Discount
. Note that the measures
Freight
and
NetAmount
are not included in Table
10.1
since they do not
follow from Queries 1a-4c, which only represent a portion of the actual set
of queries.
The
Sales
fact is defined between the
Product
,
Supplier
,
Customer
,
Employee
,and
Time
dimensions. Since the orders are associated with
different time instants, the
Time
level participates in the
Sales
fact with the
roles
OrderDate
,
DueDate
,and
ShippedDate
. According to the requirements
summarized in Table
10.1
,the
Time
dimension contains four aggregation
levels, where most of the scenarios include aggregation over time. Dimension
Product
is related to the parent level
Category
with a one-to-many cardinality,
defining a strict hierarchy. The level
Product
also contains specific information
about products.
Search WWH ::
Custom Search