Database Reference
In-Depth Information
10.2 General Overview of the Method
We next describe a general method for data warehouse design that encom-
passes various existing approaches from both research and practitioners. The
method is based on the assumption that data warehouses are a particular
type of databases dedicated to analytical purposes. Therefore, their design
should follow the traditional database design phases, that is, requirements
specification, conceptual design, logical design, and physical design, as
shown in Fig. 10.1 , which repeats, for clarity, Fig. 3.6 presented in Chap. 3 .
Nevertheless, there are significant differences between the design phases for
databases and data warehouses, which stem from their different nature, as
explained in Chap. 3 . Note that although the various phases in Fig. 10.1 are
depicted consecutively, actually there are multiple interactions between them,
especially if an iterative development process is adopted in which the system
is developed in incremental versions with increased functionality.
Requirements
specification
Conceptual
design
Logical design
Physical design
Fig. 10.1 Phases in data warehouse design (repeated from Fig. 3.6 )
The phases in Fig. 10.1 may be applied to define either the overall data
warehouse schema or the schemas of the individual data marts. From now
on, we shall use the term “data warehouse” to mean that the concepts that
we are discussing apply also to data marts if not stated otherwise.
For all the phases in Fig. 10.1 , the specification of business and technical
metadata is in continuous development. These include information about the
data warehouse schema, the data source schemas, and the ETL processes. For
example, the metadata for a data warehouse schema may provide information
such as aliases used for various elements, abbreviations, currencies for
monetary attributes or measures, and metric systems. The elements of the
source systems should also be documented similarly. This could be a dicult
task if conceptual schemas for these systems do not exist. The metadata for
the ETL processes should consider several elements, such as the frequency of
data refreshment. Data in a fact table may be required on a daily or monthly
basis or after some specific event (e.g., after finishing a project). Therefore,
users should specify a data refreshment strategy that corresponds to their
analysis needs.
To illustrate the proposed method, we will use a hypothetical scenario
concerning the design of the Northwind data warehouse we have been using
as example throughout this topic. We assume that the company wants to
analyze its sales along dimensions like customers, products, geography, and
 
Search WWH ::




Custom Search