Database Reference
In-Depth Information
2) Because we did not need to add any other operators in order to prepare our data for
mining, our model in this exercise is very simple. We could, at this point, run our model
and begin to interpret the results. This would not be very interesting however. This is
because the default for our k, or our number of clusters, is 2, as indicated by the black
arrow on the right hand side of Figure 6-3. This means we are asking RapidMiner to find
only two clusters in our data. If we only wanted to find those with high and low levels of
risk for coronary heart disease, two clusters would work. But as discussed in the
Organizational Understanding section earlier in the chapter, Sonia has already recognized
that there are likely a number of different types of groups to be considered. Simply
splitting the data set into two clusters is probably not going to give Sonia the level of detail
she seeks. Because Sonia felt that there were probably at least 4 potentially different
groups, let's change the k value to four, as depicted in Figure 6-4. We could also increase
of number of 'max runs', but for now, let's accept the default and run the model.
Figure 6-4. Setting the desired number of clusters for our model.
3) When the model is run, we find an initial report of the number of items that fell into each
of our four clusters. (Note that the clustered are numbered starting from 0, a result of
RapidMiner being written in the Java programming language.) In this particular model, our
Search WWH ::




Custom Search