the concept of data cubes, explain how can they help query processing, and
describe some challenges and algorithms to automatically materialize sub-
cubes to improve performance.
9.2.1 Data and Query Models
A data cube can be seen as a higher-dimensional spreadsheet that allows users
to analyze—typically financial—data from different perspectives. A data cube
consists of numeric facts called measures that are categorized by attributes
called dimensions . Data cubes are usually created directly from the schema
of a relational database. Measures are then derived from records in the fact
tables (e.g., sales ) and dimensions from dimension tables (e.g., suppliers
and customers ).
As a simple example, consider a business that buys certain parts from sup-
pliers and then sells them to customers at a sale price. A relational database
has information about each transaction over a period of several years. In this
case, three dimensions of interest are part , supplier , and customer . The
measure of interest is the total amount of sales . This example results in a
three-dimensional data cube. Each cell in this cube, given by specific values
(p, s, c) for part , supplier , and customer , is associated with the total
sales amount for part p bought from supplier s and sold to customer c .
Users are interested in both very specific information (e.g., the sales of
a single cell in the data cube) and consolidated information (e.g., the total
amount of sales of a given part p to a given customer c ). Also, users can be
interested in a single such cell or a set of results (e.g., the total amount of sales
of each part to a given customer c ). A common way to specify such queries is by
a tuple that contains one value for each dimension in the cube. An additional
value ALL is used to handle consolidated queries. For instance, we can ask for
all sales of part p to customer c by using (p, ALL, c) . When the result is not
a single cell but instead a group of cells, we use the dimension name in the
appropriate tuple position. For instance, we write the query that asks for the
total sales to customer c of every part in the inventory as (part, ALL, c) .
Standard Query Language (SQL) Representation of Data
We can write query (part, ALL, c) in terms of the raw relational data as
SELECT part, customer, SUM(sales)
WHERE customer = c
GROUP BY part, customer
In general, we can translate any such query to a SQL query like the previous
one, which contains all dimensions except those marked as ALL as grouping
columns and equality predicates for dimensions that have a specific value.