Databases Reference
In-Depth Information
(Section 4.2.5). Finally, the starnet model for querying multidimensional databases is
presented (Section 4.2.6).
4.2.1
Data Cube: A Multidimensional Data Model
“What is a data cube?”
A
data cube
allows data to be modeled and viewed in multiple
dimensions. It is defined by dimensions and facts.
In general terms,
dimensions
are the perspectives or entities with respect to which
an organization wants to keep records. For example,
AllElectronics
may create a
sales
data warehouse in order to keep records of the store's sales with respect to the dimen-
sions
time
,
item
,
branch
, and
location
. These dimensions allow the store to keep track
of things like monthly sales of items and the branches and locations at which the
items were sold. Each dimension may have a table associated with it, called a
dimen-
sion table
, which further describes the dimension. For example, a dimension table for
item
may contain the attributes
item name, brand
, and
type
. Dimension tables can be
specified by users or experts, or automatically generated and adjusted based on data
distributions.
A multidimensional data model is typically organized around a central theme, such
as
sales
. This theme is represented by a fact table.
Facts
are numeric measures. Think of
them as the quantities by which we want to analyze relationships between dimensions.
Examples of facts for a sales data warehouse include
dollars sold
(sales amount in dol-
lars),
units sold
(number of units sold), and
amount budgeted
. The
fact table
contains
the names of the
facts
, or measures, as well as keys to each of the related dimension tables.
You will soon get a clearer picture of how this works when we look at multidimensional
schemas.
Although we usually think of cubes as 3-D geometric structures, in data warehous-
ing the data cube is
n
-dimensional. To gain a better understanding of data cubes and
the multidimensional data model, let's start by looking at a simple 2-D data cube that
is, in fact, a table or spreadsheet for sales data from
AllElectronics
. In particular, we
will look at the
AllElectronics
sales data for items sold per quarter in the city of Van-
couver. These data are shown in Table 4.2. In this 2-D representation, the sales for
Vancouver are shown with respect to the
time
dimension (organized in quarters) and
the
item
dimension (organized according to the types of items sold). The fact or measure
displayed is
dollars sold
(in thousands).
Now, suppose that we would like to view the sales data with a third dimension. For
instance, suppose we would like to view the data according to
time
and
item
, as well as
location
, for the cities Chicago, New York, Toronto, and Vancouver. These 3-D data are
shown in Table 4.3. The 3-D data in the table are represented as a series of 2-D tables.
Conceptually, we may also represent the same data in the form of a 3-D data cube, as in
Figure 4.3.
Suppose that we would now like to view our sales data with an additional fourth
dimension such as
supplier
. Viewing things in 4-D becomes tricky. However, we can
think of a 4-D cube as being a series of 3-D cubes, as shown in Figure 4.4. If we continue