Biomedical Engineering Reference
In-Depth Information
MSE) 1 input predictor for CycB is ( x 7 ) with a MSE of 0.1238. Looking at the 2
input column, the best 2 input predictor is ( x 6 , x 7 ) with a MSE of 0.0241. For CycB ,
( x 6 , x 7 ) happens to be the actual or correct predictor.
In general, we find the correct predictor is identified as a top rank predictor in one of
the input columns for majority of genes ( E 2 F ( x 3 ), CycE ( x 4 ), Cdc 20( x 6 ), Cdh 1( x 7 ),
and CycB ( x 9 )) in the mutated mammal network for both the linear representation and
sigmoid representation. The exceptions are for gene Rb ( x 2 ) where the correct predic-
tor is the sixth ranked predictor in the list, and for genes CycA ( x 5 ) and UbcH 10( x 8 )
which have more than 4 inputs, and thus not listed in the tables which only show up
to 4 input predictors.
For gene Rb ( x 2 ), the distribution of samples do not completely cover the 4-input
state space, hence several predictors and Zhelgakin functions can closely fit with
low error. However, we observe that while the top rank predictor
{
x 1 , x 4 , x 8 , x 9 }
is
not the correct predictor
, the top rank predictor does contains 3 of
the 4 correct input genes. We make similar observation with genes CycA ( x 5 ) and
UbcH 10( x 8 ), in that the top rank predictors contain many of the correct input genes
in the actual predictors. This information can be useful helpful in refining future tests
for gene expression measurements.
{
x 1 , x 4 , x 5 , x 9 }
4.3.2
Predictor Selection Method
While the algorithm produces a ranked list of predictors for a gene, it may be desirable
to select a single best predictor. As observed from the predictor tables for the mutated
mammal network, the correct predictor is generally the top ranked predictor from
either the 1, 2, 3, or 4-input predictor lists. To select which i -input predictor list
to choose from, we use a metric called the resolution ratio R i , which measures the
difference between the top ranked predictor and second ranked predictor of a gene
with i -inputs. The resolution ratio is defined as the ratio of the MSE of the second
and top ranked predictor as shown in Eq. 4.3 .
R i =
MSE i , second /MSE i , top
(4.3)
A high resolution ratio R i indicates the top rank predictor has significantly lower
error than all other predictors of the same input size, and thus likely to be the correct
predictor. A low resolution ratio indicates that possibly several predictors (including
the top rank predictor) have similarly low error due to underfitting of the data (missing
some of the input genes), overfitting of the data (including additional or wrong input
genes), or inadequate sample distribution.
For example, let us assume for gene x i its predictor is ( x j , x k ), or in other words the
target gene x i is regulated by two input genes x j and x k . Given adequate expression
samples, we expect the MSE of the 2-input predictor ( x j , x k ) will be low since this is
the actual predictor, while any other 2-input predictors for x i will have a high MSE.
As such, the resolution ratio for this 2-input predictor R 2 will be expected to be high.
Search WWH ::




Custom Search