Graphics Reference
In-Depth Information
this data set, there are nine different measured attributes of breast cancer biopsies, as well as the
class of the tumor, which is either
benign
or
malignant
. To prepare the data for logistic regres-
sion, we must convert the factor
class
, with the levels
benign
and
malignant
, to a vector with
numeric values of 0 and 1. We'll make a copy of the
biopsy
data frame, then store the numeric
coded class in a column called
classn
:
library(MASS)
# For the data set
b
<-
biopsy
b$classn[b$class
==
"benign"
]
<-
0
b$classn[b$class
==
"malignant"
]
<-
1
b
ID V1 V2 V3 V4 V5 V6 V7 V8 V9 class classn
1000025 5 1 1 1 2 1 3 1 1
benign
0
1002945 5 4 4 5 7 10 3 2 1
benign
0
1015425 3 1 1 1 2 2 3 1 1
benign
0
...
897471 4 8 6 4 3 4 10 6 1
malignant
1
897471 4 8 8 5 4 5 10 4 1
malignant
1
Although there are many attributes we could examine, for this example we'll just look at the
relationship of
V1
(clump thickness) and the class of the tumor. Because there is a large de-
gree of overplotting, we'll jitter the points and make them semitransparent (
alpha=0.4
), hol-
low (
shape=21
), and slightly smaller (
size=1.5
). Then we'll add a fitted logistic regression line
nomial
:
ggplot(b, aes(x
=
V1, y
=
classn))
+
geom_point(position
=
position_jitter(width
=
0.3
, height
=
0.06
), alpha
=
0.4
,
shape
=
21
, size
=
1.5
)
+
stat_smooth(method
=
glm, family
=
binomial)