The creation of ordered sets of overlapping clones or “contigs” has historically been the goal of chromosome walks in gene hunting and more recently in providing tiling paths of clones for whole genome sequencing. Various methods have been used to establish clone overlaps, including simple cross hybridization. In radiation hybrid mapping, the overlaps between the […]

Algorithmic challenges in mammalian whole-genome assembly (Bioinformatics)

1. Introduction The basic methodology used today for sequencing a large genome is double-barreled shotgun sequencing. Shotgun sequencing, introduced by Sanger and colleagues early on (Sanger et al., 1977), involves obtaining a redundant representation of a genomic segment with sequenced reads and then assembling the reads into contigs on the basis of sequence overlap. Double-barreled […]

Microbial sequence assembly (Bioinformatics)

The first microbial genome projects began in the 1990s and focused either on important model systems (e.g., Escherichia coli, Saccharomyces cerevisiae) or on important pathogens (e.g., Mycobacterium tuberculosis). The prevailing view for these early microbial projects was that assembling the complete genome sequence must be piecemeal from large insert clones such as cosmids that were […]

Genome signals and assembly (Bioinformatics)

1. Introduction Sequencing of entire genomes of various organisms has become one of the basic tools of biology. However, quality of genome assembly depends to a large extent on the structure of genomic sequences, notably, signals such as repeats, polymorphisms, and nucleotide asymmetry as well as structural motifs such as protein motifs (see Article 28, […]

Comparative analysis for mapping and sequence assembly (Bioinformatics)

1. Comparative analysis for mapping and sequence assembly Comparisons of mammalian genomic sequences reveal extensive similarity at both the chromosome and basepair levels (Mouse Genome Sequencing Consortium, 2002; Rat Genome Sequencing Project Consortium, 2004). The increasing number of assembled reference sequences produced by ongoing genome sequencing projects thus provides information that is potentially useful for […]

Statistical signals (Bioinformatics)

1. Introduction Detection of statistical signals in protein and DNA sequences is often a powerful method for uncovering biological significance and function. Examples of sequence features that can be identified through their statistical properties include protein motifs (see Article 28, Computational motif discovery, Volume 7), gene promoters, enhancers and suppressors (see Article 19, Promoter prediction, […]

Errors in sequence assembly and corrections (Bioinformatics)

1. Introduction The major source of all the challenges is the limitation in sequencing technologies, which today allows us to routinely sequence only about 500 to 800 bases of contiguous DNA sequence. To overcome the limitation of short contiguous sequences, Frederick Sanger devised the shotgun sequencing technique and in 1982 demonstrated its potential by sequencing […]

Genome maps and their use in sequence assembly (Bioinformatics)

1. Introduction Genomes range in size from around a million base pairs to many thousands of millions, and yet a typical sequencing reaction yields less than a thousand base pairs of contiguous sequence information. From these tiny fragments of data, the complete genome sequence must be reconstructed as accurately and as completely as possible if […]

Repeatfinding (Bioinformatics)

Genomes in general, and eukaryotic genomes in particular, are rife with segments of repetitive sequence. Repetition is the most obvious pattern found in genetic information, and is usually indicative of a biologically significant motif or landmark in that particular biomolecule. Repetitive segments of DNA appear to be necessary for the structural function of centromeres and […]

Graphs and metrics (Bioinformatics)

1. Generalities Graphs and metrics are structures that allow to specify and represent mutual relationship between pairs of objects from a given collection of objects under consideration. The most abstract form of representation consists in just a list specifying all pairs of objects that are considered to be “related” (relative to some preconceived concept of […]