Biology Reference
In-Depth Information
and shown to be generally present in clinical isolates (Cha et al. 1996). This raised
the initial estimate to 227 ORFs, and other studies identified several additional
ORFs (Gibson et al. 1996; Mullberg et al. 1999; Kotenko et al. 2000). Subsequent
studies utilizing a gene-finding algorithm (Murphy et al. 2003a) and using the
sequenced CCMV genome as a comparator (Davison et al. 2003) suggested refine-
ments to the earlier annotations.
More recently, the sequences of clinical HCMV strains were determined.
Evaluation of these sequences led to estimates of the number of protein-coding
ORFs ranging from a maximum of 252 potentially functional ORFs that are con-
served in four different clinical isolates (Murphy et al. 2003b) to a minimum of 165
ORFs that were conserved between one HCMV clinical isolate and CCMV
(Davison et al. 2003). These two studies provide a reasonable range for the number
of HCMV coding ORFs. The higher estimate was inclusive of all ORFs that might
encode a protein. The lower estimate focused on the subset of ORFs for which a
strong case can be made for function.
We have revisited these maximal and minimal estimates of potential coding
ORFs. We did not try to discover additional ORFs; instead, we reassessed the full
set of previously annotated ORFs. Our criteria were simple. If a previously anno-
tated ORF was present in the genomes of five clinical isolates (FIX, TR, PH, Toledo
and Merlin), it was considered potentially functional. The subset of annotated
ORFs that were also present in the CCMV genome, were considered very likely to
be functional, because they have been conserved through 4-4.5 million years of
divergent evolution. These criteria closely mimicked those used in the two earlier
studies (Davison et al. 2003; Murphy et al. 2003b), and the analysis benefited from
combining the two earlier data sets.
To initiate our meta-analysis, MacVector 7.2 (Accelrys, San Diego, CA, USA)
was used to identify all start-to-stop ORFs with a coding potential of 80 amino
acids or more within each of the genomes. Next, the identified ORFS (> 400 per
genome) were translated and each polypeptide was used as a query in a BlastP
analysis against a database including all previously annotated HCMV ORFs. All
ORFs with a local alignment score of 10 -5 or less were considered matches, and
used to generate maps for each of the five genomes with MacVector. Finally, the
ORF maps were aligned to determine conservation among the clinical isolates.
Due to the substitution of the BAC sequence in four of the five HCMV genome
sequences (FIX, TR, PH and Toledo) for viral genes, the IRS1 to US12 region was
compiled by conservation between the Merlin sequence (Dolan et al. 2004) and a
BAC clone of AD169.
A master map was generated containing all ORFs that met the above criteria and
that were conserved in all five clinical isolates (Fig. 2). It contains a total of 232
potentially functional ORFs. Color-coding is used to distinguish ORFs that are
known to be essential (red), augmenting, i.e., are required for an optimal yield (yel-
low), or nonessential for replication in fibroblasts (green) (Dunn et al. 2003; Yu et al.
2003). The 15 gray ORFs have not been tested for a role in replication. The 173 red,
yellow, green and gray ORFs are present in all five HCMV genomes and the
CCMV genome. The 59 ORFs shown in white are present in the five clinical
Search WWH ::




Custom Search