Predictor Set Inference using SAT - Logic Synthesis for Genetic Diseases

Biomedical Engineering Reference

In-Depth Information

2.4.3

All-SAT

The SAT solver performs an All-SAT on S . The satisfying cubes (each cube encodes

a candidate predictor set) from the All-SAT output are collected. The process is

repeated for the remaining attractor cycle orderings. From the results, we find the

most likely predictors based on the frequency of occurrence of the predictors across

all orderings. Three methods are used to analyze the statistical results, which will be

described in the next section.

In general, the above algorithm can be applied to input data for N genes and A

attractor states. The total number of attractor state orderings is A !. For each ordering,

there can be up to O ( N 3 ) predictors per gene. The SAT search space per ordering

is on the order of O (2 N 3 ), resulting in overall complexity of O ( A

2 N 3 ). Typically,

the number of attractor states A recorded through gene expression measurements

is small. As such, A

!

is thus much smaller than 2 ( N 3 ) , so the runtime complex-

ity is dominated by the All-SAT operation. For pragmatic reasons, our algorithm

stops each All-SAT after T minutes (or C cubes), where T or C is defined by the

user./indexRuntime/indexComplexity

!

2.5

Experimental Results

To evaluate our SAT-based algorithm for inferring gene predictors, the algorithm

was tested on gene-expression data from a melanoma study done by Bittner and

Weeraratna [ 3 ]. In the melanoma study, it was observed that an abundance of RNA

(expression) for gene WNT5A was associated with a high metastasis of melanoma.

The study measured 587 genes with 31 gene expression patterns (lines). Seven genes

are believed to be closely knit: PIRIN, S100P, RET1, MART1, HADHB, STC2 , and

WNT5A . There are 18 distinct patterns, which were reduced to seven using Hamming-

distance of one, in Table 2.2 . These seven lines form the attractor states which are

the input to our algorithm.

For the experiments, we assume two additional specifications. First, we divide

attractor states into good and bad states, based on the presence of WNT5A . We allow

good attractor states to cycle only to other good attractor states, and bad attractor

states can only cycle to other bad attractor states. Second, we limit the maximum

attractor cycle length L to 3, and the maximum number of predictor inputs P to 3,

because long attractor cycles and large predictor inputs are highly complex and less

likely to occur in biological systems [ 13 ], [ 6 ].

Our algorithm utilizes a modified open-source and highly efficient exact SAT-

solver called MiniSat v1.14 [ 14 ], [ 9 ]. All-SAT operations were limited to a 30 minute

time-out./indexRuntime On average, each All-SAT run yielded 10 K satisfying cubes

in this duration. Our algorithm was implemented and run on a Pentium 4 Linux

machine with 4 GB RAM. MiniSat [ 9 ], was originally designed to find a single

satisfying assignment. We modified MiniSat to perform All-SAT as MiniSat normally

Logic Synthesis for Genetic Diseases

Search WWH ::

Custom Search

Home