Image Processing Reference
In-Depth Information
represented by the letters A, T, C and G. Over evolutionary time, mutations arise in
these strings as they are passed on from one organism to another, sometimes resulting
in an increased survival rate and a possible divergence into a new species. Under-
standing how and when these mutations occurred is a major topic in comparative
genomics and informs scientists about the relatedness of species, both genomically
and functionally.
The sequence of a genome is determined from the output of a sequencing pipeline.
In this pipeline, the genomes from many cells of an organism are extracted and
chopped up into very small segments. These segments are read using a variety of
techniques in which the individual nucleotides of each segment are determined.
Using computational alignment algorithms, the sequences from the segments are
then pieced together to produce the string of letters representing each chromosome
in the sequenced genome.
Using the genomic sequences from different species, or sometime from just a sin-
gle species, algorithms look for regions that have similar genomic sequences while
taking into account models of how genomes evolve over time. These algorithms pro-
duce pairs of conserved features, giving each pair a strength based on the amount of
similarity. Some algorithms will further group the paired features into larger regions
based on characteristics of the features like proximity and orientation (features such
as genes have an orientation along the genome). The result of these algorithms is
a multiscale list of features and regions for each chromosome that are paired with
features and regions on a different chromosome. Each of these pairs has a score that
represents the strength of the similarity.
22.2.2 Challenges for Visualization
The challenge of visualizing comparative genomics data arises on several fronts. First,
these data sets can easily contain thousands of paired features, scattered over dozens
of chromosomes. Second, the size of the features is often orders of magnitude smaller
than the size of the chromosomes. And third, it is often important to understand the
location and size of paired features in the context of their similarity scores.
22.2.3 Visualization for Comparative Genomics
Visualization tools for exploring comparative genomics data represent chromo-
somes as distinct, 1D coordinate systems, with a set of chromosomes representing a
complete genome. These chromosomes are most often represented linearly and in a
series. Conserved features and regions are shown as subregions along the chromo-
somes. These tools generally compare two genomes at a time, where the genome
of interest is considered the source and the comparison genome is considered the
destination .
 
Search WWH ::




Custom Search