Databases Reference
In-Depth Information
and summarized data, which greatly facilitates data mining. For example, rather than
storing the details of each sales transaction, a data warehouse may store a summary
of the transactions per item type for each branch or, summarized to a higher level,
for each country. The capability of OLAP to provide multiple and dynamic views
of summarized data in a data warehouse sets a solid foundation for successful data
mining.
Moreover, we also believe that data mining should be a human-centered process.
Rather than asking a data mining system to generate patterns and knowledge automati-
cally, a user will often need to interact with the system to perform exploratory data
analysis. OLAP sets a good example for interactive data analysis and provides the nec-
essary preparations for exploratory data mining. Consider the discovery of association
patterns, for example. Instead of mining associations at a primitive (i.e., low) data level
among transactions, users should be allowed to specify roll-up operations along any
dimension.
For example, a user may want to roll up on the item dimension to go from viewing the
data for particular TV sets that were purchased to viewing the brands of these TVs (e.g.,
SONY or Toshiba). Users may also navigate from the transaction level to the customer or
customer-type level in the search for interesting associations. Such an OLAP data mining
style is characteristic of multidimensional data mining. In our study of the principles
of data mining in this topic, we place particular emphasis on multidimensional data
mining, that is, on the integration of data mining and OLAP technology .
4.4 Data Warehouse Implementation
Data warehouses contain huge volumes of data. OLAP servers demand that decision
support queries be answered in the order of seconds. Therefore, it is crucial for data
warehouse systems to support highly efficient cube computation techniques, access
methods, and query processing techniques. In this section, we present an overview
of methods for the efficient implementation of data warehouse systems. Section 4.4.1
explores how to compute data cubes efficiently. Section 4.4.2 shows how OLAP data
can be indexed, using either bitmap or join indices. Next, we study how OLAP queries
are processed (Section 4.4.3). Finally, Section 4.4.4 presents various types of warehouse
servers for OLAP processing.
4.4.1 Efficient Data Cube Computation: An Overview
At the core of multidimensional data analysis is the efficient computation of aggrega-
tions across many sets of dimensions. In SQL terms, these aggregations are referred to
as group-by 's. Each group-by can be represented by a cuboid , where the set of group-by's
forms a lattice of cuboids defining a data cube. In this subsection, we explore issues
relating to the efficient computation of data cubes.
 
Search WWH ::




Custom Search