Haplotype Inference Using Propositional Satisfiability - Mathematical Approaches to Polymer Sequence Analysis and Related Problems

Biomedical Engineering Reference

In-Depth Information

At the forehead of human variation at genetic level are single nucleotide

polymorphisms (SNPs). An SNP is a single DNA position where a mutation has

occurred and one nucleotide was substituted with a different one. Moreover, the

least frequent nucleotide must be present in a significant percentage of the popula-

tion (e.g., 1%). SNPs are the most common genetic variation. The human genome

has millions of SNPs [ 42 ], which are cataloged in dbSNP, 3 the public repository for

DNA variations [ 40 ].

Haplotypes correspond to the sequence of SNPs in a single chromosome which

are inherited together. Humans are diploid organisms, which mean that our genome

is organized in pairs of homologous chromosomes, representing the maternal and

paternal chromosome. Therefore, each individual has two haplotypes for a given

stretch of the genome. Genotypes correspond to the conflated data of homologous

haplotypes.

Technological limitations prevent geneticists from acquiring experimentally the

data from a single chromosome, the haplotypes. Instead, genotypes are obtained.

This means that at each DNA position it is possible to know whether the individ-

ual has inherited the same nucleotide from both parents (homozygous positions) or

distinct nucleotides from each parent (heterozygous positions). Nonetheless, in the

latter case, it is, in general, technologically infeasible to determine which nucleotide

was inherited from each parent. The problem of obtaining the haplotypes from the

genotypes is known as haplotype inference.

Information about human's haplotypes has significant importance in clinic med-

icine [ 8 ]. Haplotypes are more informative than genotypes and, in some cases, can

predict better the severity of a disease or even be responsible for producing a specific

phenotype. In some cases of medical transplants, patients who match the donor hap-

lotypes closely are predicted to have more success on the transplant outcome [ 35 ].

Moreover, medical treatments could be customized based on patient's genetic

information, because individual responses to drugs can be attributed to a specific

haplotype [ 15 ]. Furthermore, haplotypes can help inferring population histories.

Despite being an important biological problem, haplotype inference turned also

to be a challenging mathematical problem and, therefore, has deserved significant

attention by the mathematical and computer science communities. The mathemat-

ical approaches to haplotype inference can be statistical [ 4 , 41 ] or combinatorial

[ 6 , 18 , 19 ]. Within the combinatorial methods, the haplotype inference by pure par-

simony (HIPP) approach [ 19 ] is noteworthy. The pure parsimony approach aims at

finding the haplotype inference solution which uses a smaller number of haplotypes.

The HIPP problem is APX-hard [ 28 ].

Boolean satisfiability (SAT) has been successfully applied in a significant number

of different fields [ 33 ]. The application of SAT-based methodologies in haplotype

inference has been shown to produce very competitive results when compared to

alternative methods [ 17 , 31 ]. SAT-based models currently represent the state of the

art on HIPP and, therefore, are the main focus of this chapter.

3 http://www.ncbi.nlm.nih.gov/projects/SNP

Mathematical Approaches to Polymer Sequence Analysis and Related Problems

Search WWH ::

Custom Search

Home