Data Cube Technology - Data Mining: Concepts and Techniques

Databases Reference

In-Depth Information

interest into intuitive regions at various granularities. It analyzes and mines the data by

applying various data mining techniques systematically over these regions.

There are at least four ways in which OLAP-style analysis can be fused with data

mining techniques:

1. Use cube space to define the data space for mining . Each region in cube space repre-

sents a subset of data over which we wish to find interesting patterns. Cube space

is defined by a set of expert-designed, informative dimension hierarchies, not just

arbitrary subsets of data. Therefore, the use of cube space makes the data space both

meaningful and tractable.

2. Use OLAP queries to generate features and targets for mining . The features and even

the targets (that we wish to learn to predict) can sometimes be naturally defined as

OLAP aggregate queries over regions in cube space.

3. Use data mining models as building blocks in a multistep mining process . Multidimen-

sional data mining in cube space may consist of multiple steps, where data mining

models can be viewed as building blocks that are used to describe the behavior of

interesting data sets, rather than the end results.

4. Use data cube computation techniques to speed up repeated model construction . Multi-

dimensional data mining in cube space may require building a model for each

candidate data space, which is usually too expensive to be feasible. However, by care-

fully sharing computation across model construction for different candidates based

on data cube computation techniques, efficient mining is achievable.

In this subsection we study prediction cubes , an example of multidimensional data

mining where the cube space is explored for prediction tasks. A prediction cube is a cube

structure that stores prediction models in multidimensional data space and supports

prediction in an OLAP manner. Recall that in a data cube, each cell value is an aggregate

number (e.g., count ) computed over the data subset in that cell. However, each cell value

in a prediction cube is computed by evaluating a predictive model built on the data

subset in that cell, thereby representing that subset's predictive behavior.

Instead of seeing prediction models as the end result, prediction cubes use prediction

models as building blocks to define the interestingness of data subsets, that is, they iden-

tify data subsets that indicate more accurate prediction. This is best explained with an

example.

Example 5.18 Prediction cube for identification of interesting cube subspaces. Suppose a company

has a customer table with the attributes time (with two granularity levels: month and

year ), location (with two granularity levels: state and country ), gender , salary , and one

class-label attribute: valued customer . A manager wants to analyze the decision process

of whether a customer is highly valued with respect to time and location . In particular,

he is interested in the question “ Are there times at and locations in which the value of a

Search WWH ::

Custom Search

Home