AN OVERVIEW ON VARIABLE SELECTION FOR LONGITUDINAL DATA - Quantitative Medical Data Analysis Using Mathematical Tools and Statistical Techniques

Biomedical Engineering Reference

In-Depth Information

the traditional variable selection criteria, including C p , AIC, and BIC (see

[31], [2], and [42], respectively), have been extended for longitudinal data.

This chapter will give a systematic introduction of variable selection for

longitudinal data.

In Section 2 we give an overview on variable selection for linear

mixed eects models. Selecting signicant xed eect variables is relatively

straightforward, but identication of signicant random eects variables is

very challenging; existing works dealing with this issue include Chen and

Dunson 10 and Vaida and Blanchard 46 . Selection of signicant random ef-

fects is closely related to covariance selection. Thus, we review some recent

work on covariance selection in Section 3.

Generalized estimation equations (GEE) are very popular for analyzing

binary, count and categorical longitudinal data. Penalized generalized esti-

mating equations have recently been proposed for variable selection under

the GEE framework (e.g., by Pan 36;37 , Fu 18 , and Dziak 13 ). In Section 4 we

present an overview of variable selection methods for GEE, and we explore

their performance empirically in Section 5. In Section 6 we give an intro-

duction to variable selection for partial linear models, which are useful for

modeling longitudinal data semiparametrically.

2. Variable Selection for Linear Mixed Eects Models

Suppose that we have a sample of n subjects. For the i-th subject, we

collect the response variable y ij , the d1 covariate vector x ij , and the

q1 covariate vector z ij , at various times t ij , j = 1;; n i , where n i is

the number of observations on the i-th subject and N =

P

i n i is the total

number of observations. Covariates may be constant within each subject,

or may change over time.

For succinct presentations, we will use matrix notation. Let y i =

(y i1 ;; y in i ) T , X i = (x i1 ;; x in i ) T and Z i = (z i1 ;; z in i ) T . In gen-

eral, the linear mixed effects model is dened as

y i = X i + Z i i + " i ;

(2.1)

where is the xed eect parameter vector, i is subject-specic ran-

dom eects with i

N(0; A), and " i is a random error vector following

N(0; 2 I). In the context of (2.1), model selection is a broader issue than

variable selection; for example, one may choose the best among several can-

didate mean structures 50 . However, for simplicity we focus only on variable

selection in this section, and in Section 3 we will review some methods for

covariance selection problems.

Search WWH ::

Custom Search

Home