Diversity Graphs - Clustering Challenges in Biological Network

Biology Reference

In-Depth Information

The degree and neighborhood of node v are denoted by deg( v ) and N ( v ), respec-

tively. The vector of ones is denoted by e , where length is decided by the context

of its use. If x is a vector, then diag( x ) is the symmetric matrix whose diagonal

elements correspond to x and whose off-diagonal elements are zero. So, diag( e )

is the identity matrix. For any real number C we define C + =max

We assume that haplotypes are of length n and that SNPs are indexed by i =

1 , 2 ,...,n .

{

C, 0

}

The set of all possible haplotypes of length n is the collection of

n . Similarly, the collection of all genotypes of length n

sequences

{−

1 , 1

}

n . The arithmetic of mating haplotypes to form a genotype is simply

coordinate-wise addition. So, if the maternal haplotype is (

{−

2 , 0 , 2

}

−

1 , 1 ,

−

1 , 1) and the

fraternal haplotype is (1 , 1 ,

−

1 ,

−

1), the genotype is

(

−

1 , 1 ,

−

1 , 1)+ (1 , 1 ,

−

1 ,

−

1) = (0 , 2 ,

−

2 , 0) .

(6.1)

We extend this coordinatewise addition so that we can generally add elements in

{−

n .Let a,b

2 ,

−

1 , 0 , 1 , 2

}

∈{−

2 ,

−

1 , 0 , 1 , 2

}

and define

⊕

so that

b = −

2 ,a< 0 ,b< 0

2 ,a> 0 ,b> 0

0 , otherwise .

⊕

This binary operator is commutative, and to add elements of

{−

2 ,

−

1 , 0 , 1 , 2

}

we perform the operation componentwise. For example,

(

−

1 , 1 ,

−

1 , 1)

⊕

(1 , 0 ,

−

2 , 1)

⊕

(1 ,

−

1 ,

−

1 , 1) = (0 , 0 ,

−

2 , 2) .

Notice that

⊕

reduces to the typical addition in (6.1) if the terms on the left are

4 . Unfortunately,

{−

1 , 1

}

⊕

does not generally satisfy a cancellation rule since

0 does not mean that a = b . For much of the paper simple addition

as described in (6.1) is sufficient, and to distinguish

⊕

0= b

⊕

from + we use + in all

instances where it appropriate.

SNP values of 0 are called ambiguous because the orientation of the 1 and

in the parental donations could be reversed. The question we address begins with

a collection of genotypes and asks us to construct a collection of haplotypes that

form the genotypes under this arithmetic. If there were no ambiguous SNPs, this

process would be trivial, and hence, we assume that each genotype has at least

one ambiguous SNP. The subset of

−

{−

2 , 0 , 2

}

n with this property is denoted

For any

G ⊆G

,wesay

H ⊆H

is a solution to

if, for all g ∈G ,thereexist

h , h ∈H

such that g = h + h . A solution

G is minimal if

H\{ h }

is not

G ,forall h ∈H

a solution to

.Wesay

is a minimum solution if there exists no

H to

G such that

|H |

solution

Our intent is to study the underlying graph theory of finding solutions, and

we introduce the concept of a Diversity Graph . Informally, a diversity graph is

|H|

Clustering Challenges in Biological Network

Search WWH ::

Custom Search

Home