Database Reference
In-Depth Information
OLAP
DATA MINING
Multidimensional
Data Cubes
Preprocessed
Data
Deliberately
though-out
assumptions
based on prior
knowledge
Figure 20-20
OLAP and data mining.
dures were carried out properly, your data warehouse contains data well suited
to data mining.
Already the infrastructure for data warehouses is robust, with parallel pro-
cessing technology and powerful relational database systems. Because such
scalable hardware is already in place, no new investment is needed to support
data mining.
Let us point out one difference in the way data are used from the data ware-
house for traditional analysis and data mining. When an analyst wants to analyze,
say with an OLAP tool, the analyst begins with summary data at a high level. Then
the analysis continues through the lower levels by means of drill-down techniques.
On many occasions the analyst need not go down to the detail levels. This is because
he or she finds suitable results at the higher levels for deriving conclusions. But data
mining is different. Because data mining is searching for trends and patterns, it deals
with lots of detailed data. For example, if the data mining algorithm is looking for
customer buying patterns, it certainly needs detailed data at the level of the indi-
vidual customer.
So what is a compromise approach? What is the level of granularity you need to
provide in the data warehouse? Unless it is a huge burden to keep detailed data at
the lowest level of granularity, strive to store detailed data. Otherwise, for data
mining engagements, you may have to extract detailed data directly from the oper-
ational systems. This calls for additional steps of data consolidation, cleansing, and
transformation. You may also keep light summaries in the data warehouse for tra-
ditional queries. Most of the summarized data along various sets of dimensions may
reside in the OLAP systems.
Search WWH ::




Custom Search