Information Technology Reference
In-Depth Information
given to the algorithm? Etc. Second, one needs to investigate whether the
mining results are sound from the mining methods point of view. Some
methods directly support this decision by computing certain significance
measures whereas others leave this aspect completely to the analyst and his
experience. Third, a key objective of the evaluation phase is to determine if
all important business issues have been considered adequately.
6. Deployment
After mining the data and assessing the data mining results one needs to
transfer the results back into the business environment. This can be rather
straight forward like preparing the results in form of a report that is under-
standable by business people (who of course typically are non data mining
experts). Or, as the other extreme, can be quite complex like implementing
a repeatable data mining process across the enterprise.
Although there is a broad number of competing process descriptions, e.g. [1,
4,8,5,9,10,30,32], all agree concerning the basic character of the KDD process:
KDD is by no means a push button technology. That is to say, the analyst never
walks strictly through the preprocessing tasks, mines the data, and then analyzes
and deploys the results. Rather, knowledge discovery is complex, iterative and
highly interactive. In each of the phases sketched above it is the analyst as a
human being who decides whether to proceed to the next phase, to redo the
current phase or even to step back to one of the former phases. In Figure 1 the
most important of these interdependencies between the phases are indicated by
arrows. The cycle around the process indicates the overall cyclic character of a
KDD process.
Obviously the analysts creativity and experience have a major part in such a
human centered process. Of course this nontrivial character of the KDD process
pushes constraints on the employed data mining methods.
3.2 Association Rule Mining and the KDD Process
The key to human involvement is to enable analysts to interact easily with both,
data and mining results. To illustrate this point further, let's look at a concrete
example from dependency analysis on the features of cars: in the beginning an
analyst's goal is to obtain a general “feeling” for the data. The issued mining
queries are not focused but try to capture the whole available search space. As a
consequence resulting rule sets are typically rather huge and overtax the analyst
easily. Upto several ten thousand rules are not uncommon. In the initial phase
the analyst identifies promising starting points for his further investigations. The
challenge is to do this on the basis of rule sets containing a great portion of noise,
trivial rules, or otherwise uninteresting associations.
After this orientation phase the analyst decides typically to take a closer
look at a subset of the vehicle features. For example, he focuses on rules that
containspecialequipmenttogetherwithinformationontheenginetypeinstalled.
He lowers thresholds for some rule quality measures, implying a rerun of the
algorithm. The results are not as expected, and the analyst suspects that the
results might be more convincing if the algorithm is only applied to a subset of
Search WWH ::




Custom Search