Biomedical Engineering Reference
In-Depth Information
4.1. Extensions of traditional variable selection procedures
Since a GEE model does not specify a likelihood structure, traditional mea-
sures of model t and of degrees of freedom are not well dened in the
GEE approach. Thus, extending traditional variable selection criteria, such
as C p , AIC and BIC, to the GEE approach is challenging. Dierent mea-
sures of model t and degrees of freedom lead to dierent variations upon
traditional variable selection procedures.
Let us begin with the cross-validation method, which is conceptually
simple, but computationally expensive. To extend cross-validation to GEE,
we rst need to dene a measure of model t. A simple measure is the
residual sum of squares, dened by
X
n
X
n i
b
fy ij g(x ij
)g 2 ;
RSS S =
i=1
j=1
b
where
is the resulting estimate of the GEEs (4.1). Based on the RSS S ,
the cross-validation method can be extended for GEE methods by leav-
ing one-subject out rather than leaving one-observation out so that no
cluster is broken up. To reduce computational cost, one may apply k-fold
cross-validation rather than leave-one-out cross validation (in fact, this may
improve performance, see [4], [43]). The marginal RSS criterion does not
take into account the heteroscedasticity of observations. Cantoni, Field,
Flemming and Ronchetti 8
considered generalized least squares loss. Let
b
b
g(x ij
r ij =fy ij
)g=f
V (
ij )gand dene the generalized residual sum of
b
squares
X
n
X
n i
w ij r ij :
RSS W =
i=1
j=1
where w ij 's are weights, which can be specied based on data analyst's
experience and simply set to be 1. Replacing RSS S by RSS W , we also can
extend the cross-validation method for longitudinal data.
Another cross-validation approach was suggested by Pan 37 , who pro-
posed choosing a model to minimize some linear combination of the
expected predictive bias
EPB = E x E y j G(YjX;
b
(X)j (4.2)
on new data. This is a generalization of the C p in that Pan 37 tries to predict
a risk function for future data, but is much more general than quadratic loss.
Pan's scheme for nding the model which minimizes EPB involves cross-
validation and bootstrapping. Perhaps because (4.2) is rather abstract and
Search WWH ::




Custom Search