Data Warehousing and Online Analytical Processing - Data Mining: Concepts and Techniques

Databases Reference

In-Depth Information

User control versus automation : Online analytical processing in data warehouses

is a user-controlled process. The selection of dimensions and the application of

OLAP operations (e.g., drill-down, roll-up, slicing, and dicing) are primarily directed

and controlled by users. Although the control in most OLAP systems is quite user-

friendly, users do require a good understanding of the role of each dimension.

Furthermore, in order to find a satisfactory description of the data, users may need to

specify a long sequence of OLAP operations. It is often desirable to have a more auto-

mated process that helps users determine which dimensions (or attributes) should

be included in the analysis, and the degree to which the given data set should be

generalized in order to produce an interesting summarization of the data.

This section presents an alternative method for concept description, called attribute-

oriented induction , which works for complex data types and relies on a data-driven

generalization process.

4.5.1 Attribute-Oriented Induction for Data Characterization

The attribute-oriented induction (AOI) approach to concept description was first pro-

posed in 1989, a few years before the introduction of the data cube approach. The data

cube approach is essentially based on materialized views of the data, which typically

have been precomputed in a data warehouse. In general, it performs offline aggre-

gation before an OLAP or data mining query is submitted for processing. On the

other hand, the attribute-oriented induction approach is basically a query-oriented ,

generalization-based, online data analysis technique. Note that there is no inherent

barrier distinguishing the two approaches based on online aggregation versus offline

precomputation. Some aggregations in the data cube can be computed online, while

offline precomputation of multidimensional space can speed up attribute-oriented

induction as well.

The general idea of attribute-oriented induction is to first collect the task-relevant

data using a database query and then perform generalization based on the examination

of the number of each attribute's distinct values in the relevant data set. The generali-

zation is performed by either attribute removal or attribute generalization . Aggregation

is performed by merging identical generalized tuples and accumulating their respec-

tive counts. This reduces the size of the generalized data set. The resulting generalized

relation can be mapped into different forms (e.g., charts or rules) for presentation to

the user.

The following illustrates the process of attribute-oriented induction. We first discuss

its use for characterization. The method is extended for the mining of class comparisons

in Section 4.5.3.

Example 4.11 A data mining query for characterization. Suppose that a user wants to describe

the general characteristics of graduate students in the Big University database, given

the attributes name, gender, major, birth place, birth date, residence, phone# (telephone

Search WWH ::

Custom Search

Home