Biomedical Engineering Reference
In-Depth Information
An estimate of θ n is found by putting the error estimates into Eq. (11.2). The procedure
is repeated a number of times and a final estimate, θ n , is obtained by averaging θ n
for a conservative estimate of θ opt . We could also use resubstitution to estimate the
CoD (by training with the full data set and computing the error on the same training).
This would have two effects: it would decrease computation time and would yield an
optimistic (high-biased) estimate of the CoD.
There are m C k =
predictor combinations for k predictor genes out
of m total genes. For t target genes where 1
m
!
/k
!
(m
k)
!
coeffi-
cients to be calculated. As incremental relations between smaller and larger predictor
sets are important, it is necessary to calculate the CoD for k predictor gene combi-
nations, for each k of 1, 2, 3, ... , to some stopping point. A large storage space is
required for all or part of the CoD results.
t
m , there are tm
!
/k
!
(m
k)
!
11.2.2 Prediction System Design
This section discusses design issues for the codetermination algorithm. As mentioned
earlier, the focus of this work is the design of a logic-filter-based prediction system.
A ternary logic filter is a nonlinear predictor that has a k -gene input and an output.
The input is ternary data for the k genes, and the output is the predicted value of
the quantized target expression. Rather than using the ternary-quantized conditional
expectation for the predictor, the conditional mode is used. This requires less com-
putation, the differences between it and the conditional expectation are small, and it
avoids predicting values that have not occurred in the samples, a desirable property
for coarse quantization. The ternary logic filter is defined by a logic table constructed
via the conditional probability of the output Y given input data X as follows:
1 fP (Y
=−
1
|
X ) is highest,
Y
=
( X )
=
|
(11.3)
0 fP (Y
=
0
X ) is highest,
1 fP (Y
=
1
|
X ) is highest.
Filter design becomes the computation of the conditional probability for each
input-output pair, ( x , y ). For any observation vector x , (x) is the value of Y seen
most often with x in the sample data. The size of table defining the predictor grows
exponentially with the number of predictor variables, and the number of conditional
probabilities to estimate increases accordingly. For two input variables and ternary
data, there are 3 2
=
9 conditional probabilities to estimate; for three variables, there
are 3 3
27. For gene expression ratio data, the number of input vectors available
for filter design is very limited; in fact, we often do not observe all vectors. When
applying the filter to test data, there may be inputs not observed during design. The
quantized expected value of Y , T(E
=
) , is used as the output from the filter for all
input vectors that are not observed in the training.
We can increase the information content of the input by providing additional inputs
to the filter. This additional information increases the ability to predict the target gene
expression value and decreases ε n . In the worst case, when the additional gene carries
[
Y
]
Search WWH ::




Custom Search