Data Cube Technology - Data Mining: Concepts and Techniques

Databases Reference

In-Depth Information

Most real-life top- k queries are likely to involve only a small subset of selection

attributes. To support high-dimensional ranking cubes, we can carefully select the

cuboids that need to be materialized. For example, we could choose to materialize only

the 1-D cuboids that contain single-selection dimensions. This will achieve low space

overhead and still have high performance when the number of selection dimensions

is large. In some cases, there may exist many ranking dimensions to support multiple

users with rather different preferences. For example, buyers may search for houses by

considering various factors like price, distance to school or shopping, number of years

old, floor space, and tax. In this case, a possible solution is to create multiple data parti-

tions, each of which consists of a subset of the ranking dimensions. The query processing

may need to search over a joint space involving multiple data partitions.

In summary, the general philosophy of ranking cubes is to materialize such cubes

on the set of selection dimensions. Use of the interval-based partitioning in ranking

dimensions makes the ranking cube efficient and flexible at supporting ad hoc user

queries. Various implementation techniques and query optimization methods have been

developed for efficient computation and query processing based on this framework.

5.4 Multidimensional Data Analysis in Cube Space

Data cubes create a flexible and powerful means to group and aggregate data subsets.

They allow data to be explored in multiple dimensional combinations and at vary-

ing aggregate granularities. This capability greatly increases the analysis bandwidth and

helps effective discovery of interesting patterns and knowledge from data. The use of

cube space makes the data space both meaningful and tractable.

This section presents methods of multidimensional data analysis that make use of

data cubes to organize data into intuitive regions of interest at varying granularities.

Section 5.4.1 presents prediction cubes , a technique for multidimensional data mining

that facilitates predictive modeling in multidimensional space. Section 5.4.2 describes

how to construct multifeature cubes . These support complex analytical queries involving

multiple dependent aggregates at multiple granularities. Finally, Section 5.4.3 describes

an interactive method for users to systematically explore cube space. In such exception-

based, discovery-driven exploration , interesting exceptions or anomalies in the data are

automatically detected and marked for users with visual cues.

5.4.1 Prediction Cubes: Prediction Mining in Cube Space

Recently, researchers have turned their attention toward multidimensional data min-

ing to uncover knowledge at varying dimensional combinations and granularities. Such

mining is also known as exploratory multidimensional data mining and online analytical

data mining (OLAM) . Multidimensional data space is huge. In preparing the data, how

can we identify the interesting subspaces for exploration? To what granularities should

we aggregate the data? Multidimensional data mining in cube space organizes data of

Search WWH ::

Custom Search

Home