Determining Gene Function in Boolean Networks using SAT - Logic Synthesis for Genetic Diseases

Biomedical Engineering Reference

In-Depth Information

Figure 3.1 demonstrates the circuit construction. Focusing on gene a , we note it has

2 inputs b and c , meaning there are F 2 =

10 possible functions ( g a 1 , g a 2 , ... , g a 10 )

with a true support of 2 inputs. After enumerating the 10 functions, a MUX and

select signal s a is added to select exactly one of the function outputs, as shown

in Fig. 3.1 b). A similar construction is performed for genes b and c , with MUXes

and select signals s b and s c respectively. The CNF S corresponding to the circuit of

Fig. 3.1 b is constructed as described in the previous section.

In the example, let us assume that we have the following 3 gene expression states

a , b , c , a , b , c }∈{

{

. We duplicate the circuit 3 times,

assigning each copy one of the gene expression states. Accordingly, the MUX select

signals for gene a are connected together across all 3 copies of the circuit, as are the

MUXes select signal for genes b and c .

Generally, we may have a limited number of gene expression states (in the ex-

ample, we had 3 out of a possible 8 states). In such situation, there can be several

BNs which match the observations as multiple sets of functions can be valid for the

input. Using All-SAT will implicitly generate all possible satisfying BNs. From the

3-gene example with only 3 gene expression states, there are 4 possible solutions, of

which one solution corresponds to the correct BN. To narrow the search, the results

can be pruned using curated, partial, or prior biological information. For example,

gene expressions or pathway information for some genes may be known from other

research or databases. Alternatively biologists may want to reason on the networks

assuming the presence of specific gates or transitions on some genes (for example,

gene a represses gene b ). This new information restricts the solution space by pro-

viding our method additional logical constraints, and can be added to our algorithm

using the same exact steps as described in our approach.

Additionally, our method can detect logical problems in the input data. If the

CNF S is UNSAT (not satisfiable), there is no possible assignment of logic functions

that satisfies the gene expression observations and the predictor set. This result may

occur if there is an error in either the gene expression observations or the predictor

set. For example, if the predictor set was inferred wrongly, the gene expressions were

measured incorrectly, or gene expression data from a different GRN were added, an

UNSAT result may be produced. In such a situation, our method can be rerun on

a modified predictor set, or the gene expression data can be analyzed to determine

which gene(s) is (are) causing the logical error.

Let us examine an example with an UNSAT result. Consider that we have a gene

a which is predicted by b and c . Let us assume that we are given gene expression

observations

(001110), (110001), (101101)

}

a , b , c , a , b , c }∈{

{

(000000), (000100)

}

. We observe that in the first

0, with a

observations (000000), b

0 and c

0. However, in the second

0, with a =

observation (000100), b

1. Both b and c have the same

values in these two observations, but a has a different value. Logically, this cannot

occur since a function cannot have two different outputs for the same input. The

CNF as constructed contains all valid Boolean functions and since no such Boolean

function exists for this logical conflict, the result is UNSAT.

0 and c

Logic Synthesis for Genetic Diseases

Search WWH ::

Custom Search

Home