Database Reference
In-Depth Information
c( wss[3] , sum(km$withinss) )
[1] 64483.06 64483.06
In determining the value of k, the data scientist should visualize the data and
assigned clusters. In the following code, the ggplot2 package is used to visualize
the identified student clusters and centroids.
#prepare the student data and clustering results for
plotting
df = as.data.frame(kmdata_orig[,2:4])
df$cluster = factor(km$cluster)
centers=as.data.frame(km$centers)
g1= ggplot(data=df, aes(x=English, y=Math, color=cluster ))
+
geom_point() + theme(legend.position="right") +
geom_point(data=centers,
aes(x=English,y=Math, color=as.factor(c(1,2,3))),
size=10, alpha=.3, show_guide=FALSE)
g2 =ggplot(data=df, aes(x=English, y=Science, color=cluster
)) +
geom_point() +
geom_point(data=centers,
aes(x=English,y=Science, color=as.factor(c(1,2,3))),
size=10, alpha=.3, show_guide=FALSE)
g3 = ggplot(data=df, aes(x=Math, y=Science, color=cluster
)) +
geom_point() +
geom_point(data=centers,
aes(x=Math,y=Science, color=as.factor(c(1,2,3))),
size=10, alpha=.3, show_guide=FALSE)
tmp = ggplot_gtable(ggplot_build(g1))
grid.arrange(arrangeGrob(g1 + theme(legend.position="none"),
g2 + theme(legend.position="none"),
g3 + theme(legend.position="none"),
main ="High School Student Cluster Analysis",
ncol=1))
The resulting plots are provided in Figure 4.6 . The large circles represent the
location of the cluster means provided earlier in the display of the km contents.
Search WWH ::




Custom Search