Parallel Computing in the Analysis of Gene Expression Relationships - Parallel Computing for Bioinformatics and Computational Biology

Biomedical Engineering Reference

In-Depth Information

An estimate of θ n is found by putting the error estimates into Eq. (11.2). The procedure

is repeated a number of times and a final estimate, θ n , is obtained by averaging θ n

for a conservative estimate of θ opt . We could also use resubstitution to estimate the

CoD (by training with the full data set and computing the error on the same training).

This would have two effects: it would decrease computation time and would yield an

optimistic (high-biased) estimate of the CoD.

There are m C k =

predictor combinations for k predictor genes out

of m total genes. For t target genes where 1

−

coeffi-

cients to be calculated. As incremental relations between smaller and larger predictor

sets are important, it is necessary to calculate the CoD for k predictor gene combi-

nations, for each k of 1, 2, 3, ... , to some stopping point. A large storage space is

required for all or part of the CoD results.

≤

m , there are tm

−

11.2.2 Prediction System Design

This section discusses design issues for the codetermination algorithm. As mentioned

earlier, the focus of this work is the design of a logic-filter-based prediction system.

A ternary logic filter is a nonlinear predictor that has a k -gene input and an output.

The input is ternary data for the k genes, and the output is the predicted value of

the quantized target expression. Rather than using the ternary-quantized conditional

expectation for the predictor, the conditional mode is used. This requires less com-

putation, the differences between it and the conditional expectation are small, and it

avoids predicting values that have not occurred in the samples, a desirable property

for coarse quantization. The ternary logic filter is defined by a logic table constructed

via the conditional probability of the output Y given input data X as follows:

−

1 fP (Y

=−

X ) is highest,

( X )

(11.3)

0 fP (Y

X ) is highest,

1 fP (Y

X ) is highest.

Filter design becomes the computation of the conditional probability for each

input-output pair, ( x , y ). For any observation vector x , (x) is the value of Y seen

most often with x in the sample data. The size of table defining the predictor grows

exponentially with the number of predictor variables, and the number of conditional

probabilities to estimate increases accordingly. For two input variables and ternary

data, there are 3 2

9 conditional probabilities to estimate; for three variables, there

are 3 3

27. For gene expression ratio data, the number of input vectors available

for filter design is very limited; in fact, we often do not observe all vectors. When

applying the filter to test data, there may be inputs not observed during design. The

quantized expected value of Y , T(E

) , is used as the output from the filter for all

input vectors that are not observed in the training.

We can increase the information content of the input by providing additional inputs

to the filter. This additional information increases the ability to predict the target gene

expression value and decreases ε n . In the worst case, when the additional gene carries

[

]

Parallel Computing for Bioinformatics and Computational Biology

Search WWH ::

Custom Search

Home