Beyond Querying - User-Centered Data Management

Databases Reference

In-Depth Information

Another alternative approach that can be used to carry out visual data mining is based on

the phases in the data mining process. The data mining process itself may be viewed as compris-

ing three main phases: the data preparation phase, the model derivation phase, and the validation

phase ( Kopanakis and Theodoulidis , 2001 ). A visual data mining system should seek to incorpo-

rate the user into each of the three general phases. The system also should enable the user to take

advantage of visual techniques in carrying out each and every activity related to the principal phases.

In the data preparation phase, there is the provision for visual preparation and manipulation

of data. The data preprocessing activities should be carried out in accordance with the requirements

posed by the phase and/or by the other data mining phases. Visual data mining also intends to

support the derivation of the data mining model. The derivation involves activities such as the visual

specification of the sample data set, visual specification of the model and its parameters, and visual

support for the storage of results. In a more general sense, model derivation also involves other aspects

such as evaluation, monitoring and guidance. Evaluation includes the validation of the sample data set

and the developed models or algorithms. Monitoring includes, among other activities, keeping track

of the progress of the data mining algorithms. Guidance entails activities such as the introduction of

user-defined preferences or settings. Data mining algorithms are often able to handle large amounts

of data. However, the size of the display is fixed and limited. Be that as it may, the results of data

mining algorithms are often in a form that is difficult to understand by humans who are accustomed

to perceiving information by their visual senses. The foregoing are major challenges in the validation

phase of data mining. Through the appropriate use of effective visualizations, all relevant or at least

much of the relevant data can be represented in an understandable manner. Consequently, visual

data mining is instrumental in the validation phase in that it makes the provision for the user to

acquire knowledge.

3.2.4 RELEVANT SYSTEMS

In the sequel, a discussion of some systems that are relevant to the field of visual data mining is

given. The systems offer a reasonably great and diverse number of data mining and visualization

functionalities.

Clementine 1 was developed by Integral Solutions Ltd (ISL), which was later purchased by

SPSS.The product supports quite a number of mining techniques including the following: clustering,

association rules, sequential patterns, factor analysis, and neural networks. Its visual interface reveals

much about a data mining task by illustrating the flow of control and data. Therefore, the user

is better positioned to understand and follow the mining process. Users construct a map of their

data mining project/model called a “stream” by selecting icons, called “nodes” that represent steps in

the data mining process. However, users would need to learn and think in terms of “streams” and

“nodes.” Moreover, the product does not fair very well in terms of scalability, i.e., Clementine does

not scale up very well when dealing with massive amounts of data. It should be pointed out that

1 http://www.spss.com/clementine

Search WWH ::

Custom Search

Home