Biology Reference
In-Depth Information
The degree and neighborhood of node v are denoted by deg( v ) and N ( v ), respec-
tively. The vector of ones is denoted by e , where length is decided by the context
of its use. If x is a vector, then diag( x ) is the symmetric matrix whose diagonal
elements correspond to x and whose off-diagonal elements are zero. So, diag( e )
is the identity matrix. For any real number C we define C + =max
.
We assume that haplotypes are of length n and that SNPs are indexed by i =
1 , 2 ,...,n .
{
C, 0
}
The set of all possible haplotypes of length n is the collection of
n . Similarly, the collection of all genotypes of length n
sequences
H
=
{−
1 , 1
}
n . The arithmetic of mating haplotypes to form a genotype is simply
coordinate-wise addition. So, if the maternal haplotype is (
is
{−
2 , 0 , 2
}
1 , 1 ,
1 , 1) and the
fraternal haplotype is (1 , 1 ,
1 ,
1), the genotype is
(
1 , 1 ,
1 , 1)+ (1 , 1 ,
1 ,
1) = (0 , 2 ,
2 , 0) .
(6.1)
We extend this coordinatewise addition so that we can generally add elements in
{−
n .Let a,b
2 ,
1 , 0 , 1 , 2
}
∈{−
2 ,
1 , 0 , 1 , 2
}
and define
so that
b =
2 ,a< 0 ,b< 0
2 ,a> 0 ,b> 0
0 , otherwise .
a
n
This binary operator is commutative, and to add elements of
{−
2 ,
1 , 0 , 1 , 2
}
we perform the operation componentwise. For example,
(
1 , 1 ,
1 , 1)
(1 , 0 ,
2 , 1)
(1 ,
1 ,
1 , 1) = (0 , 0 ,
2 , 2) .
Notice that
reduces to the typical addition in (6.1) if the terms on the left are
4 . Unfortunately,
in
{−
1 , 1
}
does not generally satisfy a cancellation rule since
a
0 does not mean that a = b . For much of the paper simple addition
as described in (6.1) is sufficient, and to distinguish
0= b
from + we use + in all
instances where it appropriate.
SNP values of 0 are called ambiguous because the orientation of the 1 and
1
in the parental donations could be reversed. The question we address begins with
a collection of genotypes and asks us to construct a collection of haplotypes that
form the genotypes under this arithmetic. If there were no ambiguous SNPs, this
process would be trivial, and hence, we assume that each genotype has at least
one ambiguous SNP. The subset of
{−
2 , 0 , 2
}
n with this property is denoted
G
.
For any
G ⊆G
,wesay
H ⊆H
is a solution to
G
if, for all g ∈G ,thereexist
h , h ∈H
such that g = h + h . A solution
G is minimal if
H
to
H\{ h }
is not
G ,forall h ∈H
a solution to
.Wesay
H
is a minimum solution if there exists no
H to
G such that
|H |
solution
.
Our intent is to study the underlying graph theory of finding solutions, and
we introduce the concept of a Diversity Graph . Informally, a diversity graph is
<
|H|
Search WWH ::




Custom Search