Predictor Ranking using Modified Zhegalkin Functions - Logic Synthesis for Genetic Diseases

Biomedical Engineering Reference

In-Depth Information

MSE) 1 input predictor for CycB is ( x 7 ) with a MSE of 0.1238. Looking at the 2

input column, the best 2 input predictor is ( x 6 , x 7 ) with a MSE of 0.0241. For CycB ,

( x 6 , x 7 ) happens to be the actual or correct predictor.

In general, we find the correct predictor is identified as a top rank predictor in one of

the input columns for majority of genes ( E 2 F ( x 3 ), CycE ( x 4 ), Cdc 20( x 6 ), Cdh 1( x 7 ),

and CycB ( x 9 )) in the mutated mammal network for both the linear representation and

sigmoid representation. The exceptions are for gene Rb ( x 2 ) where the correct predic-

tor is the sixth ranked predictor in the list, and for genes CycA ( x 5 ) and UbcH 10( x 8 )

which have more than 4 inputs, and thus not listed in the tables which only show up

to 4 input predictors.

For gene Rb ( x 2 ), the distribution of samples do not completely cover the 4-input

state space, hence several predictors and Zhelgakin functions can closely fit with

low error. However, we observe that while the top rank predictor

{

x 1 , x 4 , x 8 , x 9 }

not the correct predictor

, the top rank predictor does contains 3 of

the 4 correct input genes. We make similar observation with genes CycA ( x 5 ) and

UbcH 10( x 8 ), in that the top rank predictors contain many of the correct input genes

in the actual predictors. This information can be useful helpful in refining future tests

for gene expression measurements.

{

x 1 , x 4 , x 5 , x 9 }

4.3.2

Predictor Selection Method

While the algorithm produces a ranked list of predictors for a gene, it may be desirable

to select a single best predictor. As observed from the predictor tables for the mutated

mammal network, the correct predictor is generally the top ranked predictor from

either the 1, 2, 3, or 4-input predictor lists. To select which i -input predictor list

to choose from, we use a metric called the resolution ratio R i , which measures the

difference between the top ranked predictor and second ranked predictor of a gene

with i -inputs. The resolution ratio is defined as the ratio of the MSE of the second

and top ranked predictor as shown in Eq. 4.3 .

R i =

MSE i , second /MSE i , top

(4.3)

A high resolution ratio R i indicates the top rank predictor has significantly lower

error than all other predictors of the same input size, and thus likely to be the correct

predictor. A low resolution ratio indicates that possibly several predictors (including

the top rank predictor) have similarly low error due to underfitting of the data (missing

some of the input genes), overfitting of the data (including additional or wrong input

genes), or inadequate sample distribution.

For example, let us assume for gene x i its predictor is ( x j , x k ), or in other words the

target gene x i is regulated by two input genes x j and x k . Given adequate expression

samples, we expect the MSE of the 2-input predictor ( x j , x k ) will be low since this is

the actual predictor, while any other 2-input predictors for x i will have a high MSE.

As such, the resolution ratio for this 2-input predictor R 2 will be expected to be high.

Logic Synthesis for Genetic Diseases

Search WWH ::

Custom Search

Home