Biology Reference
In-Depth Information
isolates, but are not found in the CCMV genome. Finally, 20 microRNAs (miR-
NAs), predicted to be encoded by HCMV (Dunn et al. 2005; Grey et al. 2005;
Pfeffer et al. 2005), are identified as orange pins. So far, expression of 14 of the
miRNAs has been demonstrated (see the chapter by P.J.F. Rider et al., this
volume).
Further details of the full set of 232 ORFs are presented in Table 2 , which
includes previously annotated ORFs that did not pass our filters for inclusion on the
map. As is evident in Table 2, UL147a and UL148a are each missing in only the
PH clinical isolate and present in CCMV. Thus, they are likely bona-fide ORFs.
The map in Fig. 2 has numerous uncertainties. Several relate to the filters used
previously to qualify an ORF as a potential protein-coding sequence, and, there-
fore, for inclusion in the database from which we selected ORFs. First, the majority
of previously annotated ORFs were required to code polypeptides meeting a mini-
mum size standard, often 80 amino acids or more, as is evident in Table 2. This is
an arbitrary cut off, utilized for practical reasons, but, of course, there is no reason
to assume that HCMV does not encode smaller polypeptides. As a case in point,
analysis of the proteins associated with HCMV virions (Varnum et al. 2004) raises
the possibility that the virus encodes some very small polypeptides. In this study,
mass spectroscopy was employed to identify proteins in preparations of purified
virus particles. The analysis identified 12 tryptic-digestion products corresponding
to polypeptides encoded by ORFs that were not previously recognized. Several of
the ORFs encode polypeptides of fewer than 80 amino acids, and one has a coding
potential of 22 amino acids. This polypeptide might be the result of spurious tran-
scription/translation late after infection, or the polypeptide or a portion of it could
be appended to a larger protein as a consequence of splicing. Although it is not
possible to conclude that the virus encodes a 22-amino acid polypeptide from this
data set, the observation nevertheless serves to reinforce the very likely possibility
that the virus encodes small polypeptides that have been overlooked.
A second uncertainty comes from overlap restrictions placed on the pool of pre-
viously annotated ORFs. An ORF on one strand can potentially bias the sequence
of the opposing strand (Silke 1997; Cebrat et al. 1998), and the high G+C content
of HCMV (57%) potentially favors the presence of spurious ORFs since stop
codons are A+U-rich. In past annotations, the overlap of the shorter of two overlap-
ping ORFs has been arbitrarily limited to 60% or more or 25% or more, or the
overlap has been limited to 396 bp, the longest overlap documented for two HCMV
ORFs known to code proteins (UL76 and UL77). It is certainly possible that, in
some instances, functional ORFs have evolved with longer overlaps.
Another significant uncertainty to the map in Fig. 2 is our incomplete under-
standing of HCMV splicing. It is not possible to predict splice donors and acceptors
with certainty. A variety of spiced mRNAs have been successfully identified, (e.g.,
Stenberg et al. 1984; Rawlinson and Barrell 1993; Scott et al. 2002; Adair et al.
2003), but so far, there has been no exhaustive experimental search for spliced
HCMV mRNAs. Splicing can, of course, combine ORFs originally assumed to be
separate, or utilize small coding regions as a constituent of a larger mRNA.
The majority of the 173 ORFs that are present in all HCMV clinical isolates and
in CCMV are extremely likely to encode proteins. Indeed, 130 of these ORFs have
Search WWH ::




Custom Search