Database Reference
In-Depth Information
clusters are fairly well balanced. While Cluster 1, with only 118 observations (Figure 6-5),
is smaller than the other clusters, it is not unreasonably so.
Figure 6-5. The distribution of observations across our four clusters.
We could go back at this point and adjust our number of clusters, our number of 'max runs', or
even experiment with the other parameters offered by the k-Means operator. There are other
options for measurement type or divergence algorithms. Feel free to try out some of these options
if you wish. As was the case with Association Rules, there may be some back and forth trial-and-
error as you test different parameters to generate model output. When you are satisfied with your
model parameters, you can proceed to…
EVALUATION
Recall that Sonia's major objective in the hypothetical scenario posed at the beginning of the
chapter was to try to find natural breaks between different types of heart disease risk groups.
Using the k-Means operator in RapidMiner, we have identified four clusters for Sonia, and we can
now evaluate their usefulness in addressing Sonia's question. Refer back to Figure 6-5. There are a
number of radio buttons which allow us to select options for analyzing our clusters. We will start
by looking at our Centroid Table. This view of our results, shown in Figure 6-6, give the means
for each attribute in each of the four clusters we created.
 
Search WWH ::




Custom Search