Data Warehousing and Online Analytical Processing - Data Mining: Concepts and Techniques

Databases Reference

In-Depth Information

and summarized data, which greatly facilitates data mining. For example, rather than

storing the details of each sales transaction, a data warehouse may store a summary

of the transactions per item type for each branch or, summarized to a higher level,

for each country. The capability of OLAP to provide multiple and dynamic views

of summarized data in a data warehouse sets a solid foundation for successful data

mining.

Moreover, we also believe that data mining should be a human-centered process.

Rather than asking a data mining system to generate patterns and knowledge automati-

cally, a user will often need to interact with the system to perform exploratory data

analysis. OLAP sets a good example for interactive data analysis and provides the nec-

essary preparations for exploratory data mining. Consider the discovery of association

patterns, for example. Instead of mining associations at a primitive (i.e., low) data level

among transactions, users should be allowed to specify roll-up operations along any

dimension.

For example, a user may want to roll up on the item dimension to go from viewing the

data for particular TV sets that were purchased to viewing the brands of these TVs (e.g.,

SONY or Toshiba). Users may also navigate from the transaction level to the customer or

customer-type level in the search for interesting associations. Such an OLAP data mining

style is characteristic of multidimensional data mining. In our study of the principles

of data mining in this topic, we place particular emphasis on multidimensional data

mining, that is, on the integration of data mining and OLAP technology .

4.4 Data Warehouse Implementation

Data warehouses contain huge volumes of data. OLAP servers demand that decision

support queries be answered in the order of seconds. Therefore, it is crucial for data

warehouse systems to support highly efficient cube computation techniques, access

methods, and query processing techniques. In this section, we present an overview

of methods for the efficient implementation of data warehouse systems. Section 4.4.1

explores how to compute data cubes efficiently. Section 4.4.2 shows how OLAP data

can be indexed, using either bitmap or join indices. Next, we study how OLAP queries

are processed (Section 4.4.3). Finally, Section 4.4.4 presents various types of warehouse

servers for OLAP processing.

4.4.1 Efficient Data Cube Computation: An Overview

At the core of multidimensional data analysis is the efficient computation of aggrega-

tions across many sets of dimensions. In SQL terms, these aggregations are referred to

as group-by 's. Each group-by can be represented by a cuboid , where the set of group-by's

forms a lattice of cuboids defining a data cube. In this subsection, we explore issues

relating to the efficient computation of data cubes.

Search WWH ::

Custom Search

Home