Biology Reference
In-Depth Information
6.1. Introduction
The burgeoning field of computational biology is advancing the science of genet-
ics and transforming traditional 'wet lab' research into computational efforts. The
preponderance of the current research emphasizes computational aspects, which
have made significant strides in projects such as the human genome project. These
advances have the potential of redefining standard medical practice and have al-
ready proven to be a significant contribution to mankind.
One of the problems currently receiving attention is that of describing how
genetic diversity propagates from one generation to the next. Such problems
are called haplotyping problems, and a brief biological description is warranted.
Genes are sequences of DNA that code for specific traits, with the vast majority of
DNA being common among all individuals. The locations on the genome where
diversity occurs are called single nucleotide polymorphisms (SNPs). Diploid or-
ganisms like humans have two distinct copies of each gene, one from each parent,
which together describe a trait. A collection of SNPs in a single copy of a gene
is called a haplotype , and a pair of haplotypes forms a genotype . Each SNP of a
haplotype is in one of two states, denoted by
1 or 1, that corresponds to the two
distinct nucleotide base pairs of the DNA. Each SNP of a genotype is in one of
three states,
2, 0,or2, where the SNP is
2 (resp, 2) if and only if each of the
haplotypes that form the genotype have a
1 (resp. 1) at that SNP, and the SNP is
0 if and only if one of the haplotypes has a
1 and the other a 1 at that SNP.
Biologists are capable of efficiently determining an individual's genotype, but
it is difficult and costly to determine the haplotypes. However, haplotypes are
more valuable to biologists, and a haplotyping problem is to calculate the haplo-
types knowing only the genotypes. In particular, finding small collections of hap-
lotypes that explain the genotypes is biologically relevant. The problem of finding
a smallest collection of haplotypes is called the Pure Parsimony problem and em-
pirical evidence suggests that these minimum solutions naturally occur. The initial
investigations into haplotyping were undertaken by Clark in [5], and since then
there has been flourish of activity addressing computational issues [2-4, 6-19].
Theoretically we know that the parsimony problem is APX-hard and that prac-
tically it is difficult to solve on large data sets. Our goal is not directly compu-
tational, and instead we address the underlying structure of the problem through
graph theory. We do provide closed form solutions in a some instances.
6.2. Notation, Definitions and Preliminary Results
The results of this paper are graph theoretical, and a basic understanding of bipar-
tite graph theory is expected. We point readers to [1] for a thorough development.
Search WWH ::




Custom Search