Information Technology Reference
In-Depth Information
3. CHRONIC HEPATITIS: AN EXAMPLE
We now discuss a real prediction rule. From 1975 to 1980, Peter Gregory
(personal communication, 1980) of Stanford Hospital observed n = 155
chronic hepatitis patients, of which 33 died from the disease. On each
patient were recorded p = 19 covariates summarizing medical history, physi-
cal examinations, X rays, liver function tests, and biopsies. (Missing values
were replaced by sample averages before further analysis of the data.) An
effective prediction rule, based on these 19 covariates, was desired to identify
future patients at high risk. Such patients require more aggressive treatment.
Gregory used a prediction rule based on forward logistic regression. We
assume x 1 = ( t 1 , y 1 ),..., x n = ( t n , y n ) are independent and identically dis-
tributed such that conditional on t i , y i is Bernoulli with probability of
success q( t i ), where logit q( t i ) = b 0 + t i b, and where b is a column vector
of p elements. If ( 0 , ) is an estimate of (b 0 , b), then ( t 0 ), such that
logit ( t 0 ) = 0 + t 0 , is an estimate of q( t 0 ). We predict death if the
estimated probability ( t 0 ) of death were greater than
q
b
b
q
b
b
q
2 .:
1
2
ˆ
ˆ
ˆ
() =
()
h
F t
1
if
q
t
,
i.e.,
b
+
t
b
0
ˆ
0
0
0
0
(3.1)
=
0
otherwise.
Gregory's rule for estimating (b 0 , b) consists of three steps.
1. Perform an initial screening of the variables by testing H 0 : b j = 0
in the simple logistic model, logit q ( t 0 ) = b + t 0 j b j , for j = 1,..., p
separately at level a = 0.05. Retain only those variables j for which
the test is significant. Applied to Gregory's data, the initial screen-
ing retained 13 variables, 17, 12, 14, 11, 13, 19, 6, 5, 18, 10, 1,
4, 2, in increasing order of p -values.
2. To the variables that were retained in the initial screening, apply
forward logistic regression that adds variables one at a time in the
following way. Assume variables j 1 , j 2 ,..., j P 1 are already added to
the model. For each remaining j , test H 0 : b j = 0 in the linear logis-
tic model that contains variables j 1 , j 2 ,..., j P 1 , j together with the
intercept. Rao's (1973, pp. 417-420) efficient score test requires
calculating the maximum likelihood estimate only under H 0 . If the
most significant variable is significant at a = 0.05, we add that
variable to the model as variable j P 1 + 1 and start again. If none of
the remaining variables is significant at a = 0.05, we stop. From
the aforementioned 13 variables, forward logistic regression
applied to Gregory's data chose four variables (17, 11, 14, 2) that
are, respectively, albumin, spiders, bilirubin, and sex.
3. Let ( 0 , ) be the maximum likelihood estimate based on the
linear logistic model consisting of the variables chosen by forward
logistic regression together with the intercept. On Gregory's data,
it turned out that
b
b
Search WWH ::




Custom Search