Biology Reference
In-Depth Information
Chapter 2
Comparative genomics of
pathogenic Escherichia coli
Jason W. Sahl 1 , Carolyn R. Morris 2 , and David A. Rasko 2
1 Translational Genomics Research Institute, Flagstaff, AZ, USA, 2 University of Maryland School
of Medicine, Baltimore, MD, USA
INTRODUCTION
Escherichia coli is a human gut commensal isolate and deadly human pathogen
( Kaper et al., 2004 ). E. coli is easily cultured from the human gut and has been
the focus of scientific studies for greater than one hundred years (as will be
discussed elsewhere in this topic). The availability of clinical, laboratory, and
commensal isolates, as well as the associated clinical/epidemiological data have
provided highly characterized isolates for whole genome sequencing.
The first Escherichia coli genome sequenced was the laboratory-adapted
isolate, K12 MG1655 ( Blattner et al., 1997 ). The single chromosome consisted
of approximately 4.6 Mb in sequence that encodes approximately 4300 genes.
At the time of sequencing, 38% of all coding regions had no predicted func-
tion. The sequencing of this isolate was rapidly followed by the publication
of genomes from O157:H7 isolates EDL933 ( Perna et al., 2001 ) and Sakai
( Hayashi et al., 2001 ). Comparisons were made between these genomes with
the genome of K12 to determine the genetic variability between isolates in the
same species. In 2002, the genome of the uropathogenic isolate, CFT073, was
completed ( Welch et al., 2002 ). Comparisons among the three sequenced iso-
lates at that time demonstrated that all isolates only shared ∼39% of all coding
regions. At the time, the low conservation of genes and coding regions in a sin-
gle species changed the existing paradigm of gene conservation. Early thoughts
on genome sequencing were that 'representative isolates' could be sequenced
and they would represent the species or in this case pathovar. This concept was
rapidly discarded in light of the low level of conservation in this species.
In 2008, the first confirmed intestinal commensal isolate (HS) was published
( Rasko et al., 2008 ). A pan-genome analysis, based on peptide identity and con-
servation, was conducted on 17 E. coli isolates, including eight new genomes,
sequenced at that time. The results demonstrated that the conserved genomic core
of E. coli consists of ∼2200 genes. The analysis of representatives of multiple
Search WWH ::




Custom Search