Databases Reference
In-Depth Information
Missing values - Frequently, datasets are missing values for one or more
attributes in an observation. The values may be missing because at the time
the data was captured they were unknown or, for a given observation,
the values do not exist.
Since many data mining algorithms do not work well, if at all, when there
are missing values in the dataset, it is important that they be handled before
presentation to the algorithm. There are three generally deployed ways to
deal with missing values:
Eliminate all observations from the dataset containing missing values.
Provide a default value for any attributes in which there may be missing
values. The default value for example, may be the most frequently
occurring value in an attribute of discrete types, or the average value for a
numeric attribute.
Estimate using other attribute values of the observation.
Algorithm selection and application
Once the dataset has been properly prepared and an initial exploration has been
completed, you are ready to apply a data mining algorithm to the dataset. The
choice of which algorithm to apply depends on the objective of your data
mining task and the types of data available. If the objective is classification, then
you will want to choose one or more of the available classification modelers. If
you are predicting numeric output, then you will choose from an available
regression modeler.
Among modelers of a given type, you may not have a prior expectation as to
which modeler will generate the best model. In that case, you may want to apply
the data to multiple modelers, evaluate, then choose the model that performs
best for the dataset.
At the time of model building you will need to have decided which attributes
to use as input attributes and which, if building a prediction model, is the output
attribute. (Cluster, association, and sequence analyses do not have an output
attribute.) The choice of input attributes should be guided by relationships
uncovered during the initial exploration.
Once you have selected your modelers and attributes, and taken all necessary
steps to prepare the dataset, then apply that dataset to the modelers - let them do
their number crunching.
Model evaluation
After the modeler has finished its work and a model has been generated,
evaluate that model. There are two tasks to be accomplished during this phase.
 
Search WWH ::




Custom Search