Incorporating Other Physical Structures - Automated Physical Database Design and Tuning

Database Reference

In-Depth Information

the concept of data cubes, explain how can they help query processing, and

describe some challenges and algorithms to automatically materialize sub-

cubes to improve performance.

9.2.1 Data and Query Models

A data cube can be seen as a higher-dimensional spreadsheet that allows users

to analyze—typically financial—data from different perspectives. A data cube

consists of numeric facts called measures that are categorized by attributes

called dimensions . Data cubes are usually created directly from the schema

of a relational database. Measures are then derived from records in the fact

tables (e.g., sales ) and dimensions from dimension tables (e.g., suppliers

and customers ).

As a simple example, consider a business that buys certain parts from sup-

pliers and then sells them to customers at a sale price. A relational database

has information about each transaction over a period of several years. In this

case, three dimensions of interest are part , supplier , and customer . The

measure of interest is the total amount of sales . This example results in a

three-dimensional data cube. Each cell in this cube, given by specific values

(p, s, c) for part , supplier , and customer , is associated with the total

sales amount for part p bought from supplier s and sold to customer c .

Users are interested in both very specific information (e.g., the sales of

a single cell in the data cube) and consolidated information (e.g., the total

amount of sales of a given part p to a given customer c ). Also, users can be

interested in a single such cell or a set of results (e.g., the total amount of sales

of each part to a given customer c ). A common way to specify such queries is by

a tuple that contains one value for each dimension in the cube. An additional

value ALL is used to handle consolidated queries. For instance, we can ask for

all sales of part p to customer c by using (p, ALL, c) . When the result is not

a single cell but instead a group of cells, we use the dimension name in the

appropriate tuple position. For instance, we write the query that asks for the

total sales to customer c of every part in the inventory as (part, ALL, c) .

9.2.1.1

Standard Query Language (SQL) Representation of Data

Cube Queries

We can write query (part, ALL, c) in terms of the raw relational data as

follows:

SELECT part, customer, SUM(sales)

FROM Sales

WHERE customer = c

GROUP BY part, customer

In general, we can translate any such query to a SQL query like the previous

one, which contains all dimensions except those marked as ALL as grouping

columns and equality predicates for dimensions that have a specific value.

Search WWH ::

Custom Search

Home