Biomedical Engineering Reference
In-Depth Information
4.1. Extensions of traditional variable selection procedures
Since a GEE model does not specify a likelihood structure, traditional mea-
sures of model t and of degrees of freedom are not well dened in the
GEE approach. Thus, extending traditional variable selection criteria, such
as C
p
, AIC and BIC, to the GEE approach is challenging. Dierent mea-
sures of model t and degrees of freedom lead to dierent variations upon
traditional variable selection procedures.
Let us begin with the cross-validation method, which is conceptually
simple, but computationally expensive. To extend cross-validation to GEE,
we rst need to dene a measure of model t. A simple measure is the
residual sum of squares, dened by
X
n
X
n
i
b
fy
ij
g(x
ij
)g
2
;
RSS
S
=
i=1
j=1
b
where
is the resulting estimate of the GEEs (4.1). Based on the RSS
S
,
the cross-validation method can be extended for GEE methods by leav-
ing one-subject out rather than leaving one-observation out so that no
cluster is broken up. To reduce computational cost, one may apply k-fold
cross-validation rather than leave-one-out cross validation (in fact, this may
improve performance, see [4], [43]). The marginal RSS criterion does not
take into account the heteroscedasticity of observations. Cantoni, Field,
Flemming and Ronchetti
8
considered generalized least squares loss. Let
b
b
g(x
ij
r
ij
=fy
ij
)g=f
V (
ij
)gand dene the generalized residual sum of
b
squares
X
n
X
n
i
w
ij
r
ij
:
RSS
W
=
i=1
j=1
where w
ij
's are weights, which can be specied based on data analyst's
experience and simply set to be 1. Replacing RSS
S
by RSS
W
, we also can
extend the cross-validation method for longitudinal data.
Another cross-validation approach was suggested by Pan
37
, who pro-
posed choosing a model to minimize some linear combination of the
expected predictive bias
EPB = E
x
E
y
j G(YjX;
b
(X)j (4.2)
on new data. This is a generalization of the C
p
in that Pan
37
tries to predict
a risk function for future data, but is much more general than quadratic loss.
Pan's scheme for nding the model which minimizes EPB involves cross-
validation and bootstrapping. Perhaps because (4.2) is rather abstract and
Search WWH ::
Custom Search