Graphics Reference
In-Depth Information
this data set, there are nine different measured attributes of breast cancer biopsies, as well as the
class of the tumor, which is either benign or malignant . To prepare the data for logistic regres-
sion, we must convert the factor class , with the levels benign and malignant , to a vector with
numeric values of 0 and 1. We'll make a copy of the biopsy data frame, then store the numeric
coded class in a column called classn :
library(MASS) # For the data set
b <- biopsy
b$classn[b$class == "benign" ] <- 0
b$classn[b$class == "malignant" ] <- 1
b
ID V1 V2 V3 V4 V5 V6 V7 V8 V9 class classn
1000025 5 1 1 1 2 1 3 1 1
benign
0
1002945 5 4 4 5 7 10 3 2 1
benign
0
1015425 3 1 1 1 2 2 3 1 1
benign
0
...
897471 4 8 6 4 3 4 10 6 1 malignant 1
897471 4 8 8 5 4 5 10 4 1 malignant 1
Although there are many attributes we could examine, for this example we'll just look at the
relationship of V1 (clump thickness) and the class of the tumor. Because there is a large de-
gree of overplotting, we'll jitter the points and make them semitransparent ( alpha=0.4 ), hol-
low ( shape=21 ), and slightly smaller ( size=1.5 ). Then we'll add a fitted logistic regression line
( Figure 5-20 ) by telling stat_smooth() to use the glm() function with the option family=bi-
nomial :
ggplot(b, aes(x = V1, y = classn)) +
geom_point(position = position_jitter(width = 0.3 , height = 0.06 ), alpha = 0.4 ,
shape = 21 , size = 1.5 ) +
stat_smooth(method = glm, family = binomial)
Search WWH ::




Custom Search