Diversity Graphs - Clustering Challenges in Biological Network

Biology Reference

In-Depth Information

6.1. Introduction

The burgeoning field of computational biology is advancing the science of genet-

ics and transforming traditional 'wet lab' research into computational efforts. The

preponderance of the current research emphasizes computational aspects, which

have made significant strides in projects such as the human genome project. These

advances have the potential of redefining standard medical practice and have al-

ready proven to be a significant contribution to mankind.

One of the problems currently receiving attention is that of describing how

genetic diversity propagates from one generation to the next. Such problems

are called haplotyping problems, and a brief biological description is warranted.

Genes are sequences of DNA that code for specific traits, with the vast majority of

DNA being common among all individuals. The locations on the genome where

diversity occurs are called single nucleotide polymorphisms (SNPs). Diploid or-

ganisms like humans have two distinct copies of each gene, one from each parent,

which together describe a trait. A collection of SNPs in a single copy of a gene

is called a haplotype , and a pair of haplotypes forms a genotype . Each SNP of a

haplotype is in one of two states, denoted by

−

1 or 1, that corresponds to the two

distinct nucleotide base pairs of the DNA. Each SNP of a genotype is in one of

three states,

−

2, 0,or2, where the SNP is

−

2 (resp, 2) if and only if each of the

haplotypes that form the genotype have a

−

1 (resp. 1) at that SNP, and the SNP is

0 if and only if one of the haplotypes has a

1 and the other a 1 at that SNP.

Biologists are capable of efficiently determining an individual's genotype, but

it is difficult and costly to determine the haplotypes. However, haplotypes are

more valuable to biologists, and a haplotyping problem is to calculate the haplo-

types knowing only the genotypes. In particular, finding small collections of hap-

lotypes that explain the genotypes is biologically relevant. The problem of finding

a smallest collection of haplotypes is called the Pure Parsimony problem and em-

pirical evidence suggests that these minimum solutions naturally occur. The initial

investigations into haplotyping were undertaken by Clark in [5], and since then

there has been flourish of activity addressing computational issues [2-4, 6-19].

Theoretically we know that the parsimony problem is APX-hard and that prac-

tically it is difficult to solve on large data sets. Our goal is not directly compu-

tational, and instead we address the underlying structure of the problem through

graph theory. We do provide closed form solutions in a some instances.

−

6.2. Notation, Definitions and Preliminary Results

The results of this paper are graph theoretical, and a basic understanding of bipar-

tite graph theory is expected. We point readers to [1] for a thorough development.

Clustering Challenges in Biological Network

Search WWH ::

Custom Search

Home