Information Technology Reference
In-Depth Information
of interaction terms, the size grows considerably more. In a dataset with 30 loci, a full
model with all first order terms and two-way interaction terms will have 465 terms.
This can be prohibitively large for most datasets and algorithms. If the model space
is restricted to r<p predictors and the corresponding epistasis terms, then any model
considered will not have nearly as many terms. If r is chosen wisely, then the researcher
can ensure that each model under consideration has sufficient degrees of freedom for
parameter estimation.
Furthermore, cases where linear dependencies exist among the predictors estimation
can be complicated. One approach to address this issue is to assign P ( M c )=0 to
all models where linear dependencies exist among the predictors. Hence removing all
multicollinear models from consideration. Any time there are multicollinear terms an
index will need to be created in order to keep track of any aliased terms. This aliasing
can cause problems when there is a large effect size for the aliased terms.
The use of restricted model spaces allows for the assessment of all candidate vari-
ables, however it restricts the number of candidate variables that may be simultaneously
considered in a single model. [14], [15], [16], [17] and [13] use two restrictions one for
the number of main effect terms and one for the number of epistatic terms allowed in
the model simultaneously. They also give a simple guideline to determine the size of
each restriciton. They suggest to choose the restriction r = m +2 m where m is the
a priori expected number of main effects. Similarly the same formula can be employed
where m is the expected number of epistatic effect. While this is an easily determined
guideline, in practice and is shown, anecdotally, in Section 4.1 that the restriction size
does not seem to have a great impact on the resulting inferences from the proposed
method. However, one should note that if the restriction is set very small the stochastic
search will have a difficult time moving around the model space and hence the algo-
rithm will take a long time to converge.
To search through the restricted model space, MC 3 can be employed using equa-
tion (7). Note that q ( M t |M c ) must be determined to move through the sample space.
Let nbd ( M c ) be all models with one main effect term more, one valid interaction term
more, one main effect term less and one interaction term less than model M l . Denote
adding a main effect term as AMT, adding an interaction effect term as AIT, drop-
ping a main effect term as DMT and dropping an interaction effct term as DIT. The
probability of each of these actions depends on the attributes of the current model M c .
Let γ c and φ c be the number of main effect terms and number of interaction terms
in M c , respectively. In order to ensure that all models in nbd ( M c ) are equally likely,
the probability of each action, AMT, AIT, DMT and DIT need to be determined. Let
Ω =
be an action space. Once these probabilities have
been calculated, the following procedure allows for each of the models in nbd ( M c ) to
be candidate models. First determine, P ( AMT ) , P ( AIT ) , P ( DMT ) and P ( DIT ) ,
and choose an action with the corresponding probability. Then select with equal prob-
ability a model that is in nbd ( M c ) and corresponds to the action chosen. This proce-
dure ensures that all models in nbd ( M c ) have equal probability. Having all models in
nbd ( M c ) equally likely will be necessary in computing q ( M c |
{
AMT,AIT,DMT,DIT
}
M t ) .
 
Search WWH ::




Custom Search