Introduction - Visual Data Mining: The VisMiner Approach

Databases Reference

In-Depth Information

Missing values - Frequently, datasets are missing values for one or more

attributes in an observation. The values may be missing because at the time

the data was captured they were unknown or, for a given observation,

the values do not exist.

Since many data mining algorithms do not work well, if at all, when there

are missing values in the dataset, it is important that they be handled before

presentation to the algorithm. There are three generally deployed ways to

deal with missing values:

Eliminate all observations from the dataset containing missing values.

Provide a default value for any attributes in which there may be missing

values. The default value for example, may be the most frequently

occurring value in an attribute of discrete types, or the average value for a

numeric attribute.

Estimate using other attribute values of the observation.

Algorithm selection and application

Once the dataset has been properly prepared and an initial exploration has been

completed, you are ready to apply a data mining algorithm to the dataset. The

choice of which algorithm to apply depends on the objective of your data

mining task and the types of data available. If the objective is classification, then

you will want to choose one or more of the available classification modelers. If

you are predicting numeric output, then you will choose from an available

regression modeler.

Among modelers of a given type, you may not have a prior expectation as to

which modeler will generate the best model. In that case, you may want to apply

the data to multiple modelers, evaluate, then choose the model that performs

best for the dataset.

At the time of model building you will need to have decided which attributes

to use as input attributes and which, if building a prediction model, is the output

attribute. (Cluster, association, and sequence analyses do not have an output

attribute.) The choice of input attributes should be guided by relationships

uncovered during the initial exploration.

Once you have selected your modelers and attributes, and taken all necessary

steps to prepare the dataset, then apply that dataset to the modelers - let them do

their number crunching.

Model evaluation

After the modeler has finished its work and a model has been generated,

evaluate that model. There are two tasks to be accomplished during this phase.

Search WWH ::

Custom Search

Home