Algorithms for sequence errors (Bioinformatics)

The current state-of-the-art sequencing relies on the Sanger dideoxy chain termination method (Sanger et al., 1977). The method is based on a DNA synthesis process controlled by adding labeled dideoxy terminators, ddNTPs, into a polymerase reaction. DNA polymerases copy single-stranded DNA templates by incorporating nucleotides at the 3′ end of a primer annealed to the […]

Polymorphism and sequence assembly (Bioinformatics)

1. Introduction Worldwide research efforts to characterize the human genome, such as the Human Genome Project (see Article 24, The Human Genome Project, Volume 3), the ENCODE project, and the International HapMap Project, along with advances in DNA sequencing technologies, have produced an enormous amount of DNA sequence information. This information is becoming available in […]

Prokaryotic gene identification in silico (Bioinformatics)

Statistical methods for the identification of protein-coding regions in prokaryotic genomes have been the main tools for gene annotation since the start of genomic era. The pioneer papers describing gene-recognition algorithms appeared in press in the 1980s (Fickett, 1982; Staden, 1984; Gribskov etal., 1984; Almagor, 1985; Claverie and Bougueleret, 1986; Borodovsky etal., 1986a, b, c; […]

Eukaryotic gene finding (Bioinformatics)

1. Introduction Gene prediction is the process of inferring the sequence of the functional products encoded in genomic DNA sequences. In this chapter, we will review computational methods to predict protein-coding genes. These methods are usually limited to the prediction of the coding fraction of the genes and they ignore for the most part the […]

Spliced alignment (Bioinformatics)

1. Introduction Finding genes in genomic DNA sequences is a fundamental step in genome analysis. Spliced alignment is an effective method for finding multiexon genes for which a similar cDNA or protein sequence is available. A cDNA sequence is either of full length or of partial length such as an expressed sequence tag (EST). The […]

Searching for genes and biologically related signals in DNA sequences (Bioinformatics)

1. Introduction Complex software systems integrating sophisticated machine learning techniques have been developed to elucidate the structures and functions of genes, but accurate gene annotation is still difficult to achieve, primarily due to the complexity inherent in biological systems. If the biological signals surrounding coding exons could be detected perfectly, then the protein-coding regions of […]

Pair hidden Markov models (Bioinformatics)

1. Introduction Many of the early contributions of computer science to biological sequence analysis consisted of the development of algorithms for pairwise sequence alignment, most notably the Needleman-Wunsch algorithm (Needleman and Wunsch, 1970), which was later extended and refined by Smith, Waterman (Waterman and Smith, 1981), and others. It was only 20 years later, when […]

Information theory as a model of genomic sequences (Bioinformatics)

1. Theory Shannon and Weaver (1949) developed their theory of information in order to understand the transmission of electronic signals and model the communication system. Gatlin (1972) first described its extension to biology. Information theory is an obvious tool to use in looking for patterns in DNA and protein sequences (Schneider, 1995). Information theory has […]

Promoter prediction (Bioinformatics)

1. Biological problem and importance for practice and science Complex processes in cells of living organisms depend on synchronous actions of different groups of genes. Coordination of gene expression is achieved to a large extent by different transcriptional control mechanisms characteristic for each gene and controlling timing, rate, and level of its transcription. Promoters represent […]

Gene structure prediction in plant genomes (Bioinformatics)

1. Introduction In eukaryotes, the presence of intervening sequences (introns) within most genes makes the problem of computational gene structure prediction distinct from (and harder than) the same problem in prokaryotes. However, even among eukaryotes, the problem is varied beyond the basic need for species-specific training of algorithm parameters. For example, introns are rare in […]