Databases Reference
In-Depth Information
Another alternative approach that can be used to carry out visual data mining is based on
the phases in the data mining process. The data mining process itself may be viewed as compris-
ing three main phases: the data preparation phase, the model derivation phase, and the validation
phase ( Kopanakis and Theodoulidis , 2001 ). A visual data mining system should seek to incorpo-
rate the user into each of the three general phases. The system also should enable the user to take
advantage of visual techniques in carrying out each and every activity related to the principal phases.
In the data preparation phase, there is the provision for visual preparation and manipulation
of data. The data preprocessing activities should be carried out in accordance with the requirements
posed by the phase and/or by the other data mining phases. Visual data mining also intends to
support the derivation of the data mining model. The derivation involves activities such as the visual
specification of the sample data set, visual specification of the model and its parameters, and visual
support for the storage of results. In a more general sense, model derivation also involves other aspects
such as evaluation, monitoring and guidance. Evaluation includes the validation of the sample data set
and the developed models or algorithms. Monitoring includes, among other activities, keeping track
of the progress of the data mining algorithms. Guidance entails activities such as the introduction of
user-defined preferences or settings. Data mining algorithms are often able to handle large amounts
of data. However, the size of the display is fixed and limited. Be that as it may, the results of data
mining algorithms are often in a form that is difficult to understand by humans who are accustomed
to perceiving information by their visual senses. The foregoing are major challenges in the validation
phase of data mining. Through the appropriate use of effective visualizations, all relevant or at least
much of the relevant data can be represented in an understandable manner. Consequently, visual
data mining is instrumental in the validation phase in that it makes the provision for the user to
acquire knowledge.
3.2.4 RELEVANT SYSTEMS
In the sequel, a discussion of some systems that are relevant to the field of visual data mining is
given. The systems offer a reasonably great and diverse number of data mining and visualization
functionalities.
Clementine 1 was developed by Integral Solutions Ltd (ISL), which was later purchased by
SPSS.The product supports quite a number of mining techniques including the following: clustering,
association rules, sequential patterns, factor analysis, and neural networks. Its visual interface reveals
much about a data mining task by illustrating the flow of control and data. Therefore, the user
is better positioned to understand and follow the mining process. Users construct a map of their
data mining project/model called a “stream” by selecting icons, called “nodes” that represent steps in
the data mining process. However, users would need to learn and think in terms of “streams” and
“nodes.” Moreover, the product does not fair very well in terms of scalability, i.e., Clementine does
not scale up very well when dealing with massive amounts of data. It should be pointed out that
1 http://www.spss.com/clementine
Search WWH ::




Custom Search