Database Reference
In-Depth Information
MODELING
The ' k ' in k-means clustering stands for some number of groups, or clusters. The aim of this data
mining methodology is to look at each observation's individual attribute values and compare them
to the means, or in other words averages, of potential groups of other observations in order to find
natural groups that are similar to one another. The k-means algorithm accomplishes this by
sampling some set of observations in the data set, calculating the averages, or means, for each
attribute for the observations in that sample, and then comparing the other attributes in the data
set to that sample's means. The system does this repetitively in order to 'circle-in' on the best
matches and then to formulate groups of observations which become the clusters. As the means
calculated become more and more similar, clusters are formed, and each observation whose
attributes values are most like the means of a cluster become members of that cluster. Using this
process, k-means clustering models can sometimes take a long time to run, especially if you
indicate a large number of “max runs” through the data, or if you seek for a large number of
clusters ( k ). To build your k-means cluster model, complete the following steps:
1) Return to design view in RapidMiner if you have not done so already. In the operators
search box, type k-means (be sure to include the hyphen). There are three operators that
conduct k-means clustering work in RapidMiner. For this exercise, we will choose the first,
which is simply named “k-Means”. Drag this operator into your stream, and shown in
Figure 6-3.
Figure 6-3. Adding the k-Means operator to our model.
 
Search WWH ::




Custom Search