Information Technology Reference
In-Depth Information
subtle the effect, these values might have on a curve drawn through 20-
dimensional space. 1
Gong's logistic regression models were constructed in two stages. At
the first stage, each of the explanatory variables was evaluated on a univari-
ate basis. Thirteen of these variables proved significant at the 5% level
when applied to the original data. A forward multiple regression was
applied to these thirteen variables and four were selected for use in the
predictor equation.
When she took bootstrap samples from the 155 patients, the R 2 values
of the final models associated with each individual bootstrap sample, varied
widely. Not reported in this article, but far more important, is that while
two of the original four predictor variables always appeared in the final
model generated from a bootstrap sample of the patients, five other vari-
ables appeared in only some of the models.
We strongly urge you to adopt Dr. Gong's bootstrap approach to vali-
dating multi-variable models. Retain only those variables which appear
consistently in the bootstrap regression models. Additional methods for
model validation are described in Chapter 11.
Correcting for Confounding Variables
When your objective is to verify the association between predetermined
explanatory variables and the response variable, multiple linear regression
analysis permits you to provide for one or more confounding variables that
could not be controlled otherwise.
GENERALIZED LINEAR MODELS
Today, most statistical software incorporates new advanced algorithms for
the analysis of generalized linear models (GLMs) 2 and extensions to panel
data settings including fixed-, random- and mixed-effects models, logistic-,
Poisson, and negative-binomial regression, GEEs, and HLMs. These
models take the form Y = g -1 [b X ] + e, where b is a vector of to-be-
determined coefficients, X is a matrix of explanatory variables, and e is a
vector of identically distributed random variables. These variables may be
normal, gamma, or Poisson depending on the specified variance of the
GLM. The nature of the relationship between the outcome variable and
the coefficients depend on the specified link function g of the GLM. Panel
data models include the following:
Fixed Effects. An indicator variable for each subject is added and used to
fit the model. Though often applied to the analysis of repeated measures,
That's one dimension for risk of death, the dependent variable, and 19 for the explanatory
variables.
2
1
As first defined by Nelder and Wedderburn [1972].
Search WWH ::




Custom Search