k-Means Clustering - Data Mining for the Masses

Database Reference

In-Depth Information

2) Because we did not need to add any other operators in order to prepare our data for

mining, our model in this exercise is very simple. We could, at this point, run our model

and begin to interpret the results. This would not be very interesting however. This is

because the default for our k, or our number of clusters, is 2, as indicated by the black

arrow on the right hand side of Figure 6-3. This means we are asking RapidMiner to find

only two clusters in our data. If we only wanted to find those with high and low levels of

risk for coronary heart disease, two clusters would work. But as discussed in the

Organizational Understanding section earlier in the chapter, Sonia has already recognized

that there are likely a number of different types of groups to be considered. Simply

splitting the data set into two clusters is probably not going to give Sonia the level of detail

she seeks. Because Sonia felt that there were probably at least 4 potentially different

groups, let's change the k value to four, as depicted in Figure 6-4. We could also increase

of number of 'max runs', but for now, let's accept the default and run the model.

Figure 6-4. Setting the desired number of clusters for our model.

3) When the model is run, we find an initial report of the number of items that fell into each

of our four clusters. (Note that the clustered are numbered starting from 0, a result of

RapidMiner being written in the Java programming language.) In this particular model, our

Search WWH ::

Custom Search

Home