Scatter Plots - R Graphics

Graphics Reference

In-Depth Information

this data set, there are nine different measured attributes of breast cancer biopsies, as well as the

class of the tumor, which is either benign or malignant . To prepare the data for logistic regres-

sion, we must convert the factor class , with the levels benign and malignant , to a vector with

numeric values of 0 and 1. We'll make a copy of the biopsy data frame, then store the numeric

coded class in a column called classn :

library(MASS) # For the data set

b <- biopsy

b$classn[b$class == "benign" ] <- 0

b$classn[b$class == "malignant" ] <- 1

b

ID V1 V2 V3 V4 V5 V6 V7 V8 V9 class classn

1000025 5 1 1 1 2 1 3 1 1

benign

0

1002945 5 4 4 5 7 10 3 2 1

benign

0

1015425 3 1 1 1 2 2 3 1 1

benign

0

...

897471 4 8 6 4 3 4 10 6 1 malignant 1

897471 4 8 8 5 4 5 10 4 1 malignant 1

Although there are many attributes we could examine, for this example we'll just look at the

relationship of V1 (clump thickness) and the class of the tumor. Because there is a large de-

gree of overplotting, we'll jitter the points and make them semitransparent ( alpha=0.4 ), hol-

low ( shape=21 ), and slightly smaller ( size=1.5 ). Then we'll add a fitted logistic regression line

( Figure 5-20 ) by telling stat_smooth() to use the glm() function with the option family=bi-

nomial :

ggplot(b, aes(x = V1, y = classn)) +

geom_point(position = position_jitter(width = 0.3 , height = 0.06 ), alpha = 0.4 ,

shape = 21 , size = 1.5 ) +

stat_smooth(method = glm, family = binomial)

Search WWH ::

Custom Search

Home