Databases Reference
In-Depth Information
arule. A is a user-tunable parameter. The fitness function in Ref. 78, also
prefers clusters with small radius if they cover the same data points.
2.6. Conclusions and Future Directions
This chapter discusses the use of evolutionary computation in data mining
and knowledge discovery by using intrusion detection systems as an
example. The discussion centers around the role of EAs in achieving the
two high-level primary goals of data mining: prediction and description. In
particular, classification and regression tasks for prediction, and clustering
tasks for description. The use of EAs for feature selection in the pre-
processing step is also discussed. Another goal of this chapter was to show
how basic elements in EAs, such as representations, selection schemes,
evolutionary operators, and fitness functions have to be adapted to extract
accurate and useful patterns from data in different data mining tasks.
Although experiments reasserted the effectiveness and accuracy of EC
in data mining algorithms, there are still challenges that lie ahead for
researchers in this area. The first challenge is the huge volume of data
that makes building effective evolutionary models dicult, especially in
fitness evaluation. We can either resort to hardware specific approaches,
such as to relocate the fitness evaluation step from CPU to GPU, 90 or to
software approaches, such as various data sampling techniques, 11,66 divide-
and-conquer algorithms, 40,41 distributed and parallel EAs. 53,54,56 The second
challenge is handling imbalanced data distributions. Both Ref. 44 and Ref. 65
point out that individuals which have better performance on frequently
occurring patterns would be more likely to survive, even if they perform
worse than competing individuals on less frequent patterns. Therefore, when
designing a data mining algorithm based on EAs, one should consider how
to improve the accuracy on relatively rare patterns without compromising
performance on more frequent patterns. Finally, acquiring knowledge from
data is often regarded as a multimodal problem. In our perspective, it is
even harder than normal multimodal problems, simply because adaptation
and optimization occur on subsolutions (i.e., rules in a rule set such as
the Michigan approach) at the same time. New evolutionary techniques or
extensions in EAs are needed. We believe solving these challenges will further
improve the performance of EC-based data mining algorithms.
Acknowledgment
Funding from NSERC under RGPIN 283304-07 is gratefully acknowledged.
Search WWH ::




Custom Search