Database Reference
In-Depth Information
Using R to Perform a K-means Analysis
To illustrate how to use the WSS to determine an appropriate number, k, of
clusters, the following example uses R to perform a k-means analysis. The task is to
group 620 high school seniors based on their grades in three subject areas: English,
mathematics, and science. The grades are averaged over their high school career
and assume values from 0 to 100. The following R code establishes the necessary R
libraries and imports the CSV file containing the grades.
library(plyr)
library(ggplot2)
library(cluster)
library(lattice)
library(graphics)
library(grid)
library(gridExtra)
#import the student grades
grade_input = as.data.frame(read.csv("c:/data/
grades_km_input.csv"))
The following R code formats the grades for processing. The data file contains
four columns. The first column holds a student identification (ID) number, and
the other three columns are for the grades in the three subject areas. Because the
student ID is not used in the clustering analysis, it is excluded from the k-means
input matrix, kmdata .
kmdata_orig = as.matrix(grade_input[,c("Student","English",
"Math","Science")])
kmdata <- kmdata_orig[,2:4]
kmdata[1:10,]
English Math Science
[1,] 99 96 97
[2,] 99 96 97
[3,] 98 97 97
[4,] 95 100 95
[5,] 95 96 96
[6,] 96 97 96
[7,] 100 96 97
[8,] 95 98 98
Search WWH ::




Custom Search