Database Reference
In-Depth Information
5) How might the presence of outliers in the attributes of a data set influence the usefulness
of a k-Means clustering model? What could be done to address the problem?
EXERCISE
Think of an example of a problem that could be at least partially addressed by being able to group
observations in a data set into clusters. Some examples might be grouping kids who might be at
risk for delinquency, grouping product sale volumes, grouping workers by productivity and
effectiveness, etc. Search the Internet or other resources available to you for a data set that would
allow you to investigate your question using a k-means model. As with all exercises in this text,
please ensure that you have permission to use any data set that might belong to your employer or
another entity. When you have secured your data set, complete the following steps:
1) Ensure that your data set is saved as a CSV file. Import your data set into your
RapidMiner repository and save it with a meaningful name. Drag it into a new process
window in RapidMiner.
2) Conduct any data preparation that you need for your data set. This may include handling
inconsistent data, dealing with missing values, or changing data types. Remember that in
order to calculate means, each attribute in your data set will need to be numeric. If, for
example, one of your attributes contains the values 'yes' and 'no', you may need to change
these to be 1 and 0 respectively, in order for the k-Means operator to work.
3) Connect a k-Means operator to your data set, configure your parameters (especially set
your k to something meaningful for your question) and then run your model.
4) Investigate your Centroid Table, Folder View, and the other evaluation tools.
5) Report your findings for your clusters. Discuss what is interesting about them and
describe what iterations of modeling you went through, such as experimentation with
different parameter values, to generate the clusters. Explain how your findings are relevant
to your original question.
 
Search WWH ::




Custom Search