Biology Reference
In-Depth Information
Some human genes encode very large proteins, for example the cardiac titin
(
TTN
; 2q31) cDNA is 82 kb in length and predicts a 27 000 amino acid protein
with a molecular weight of nearly 3000 kDa (Labeit and Kolmerer, 1995). Large
mRNAs are also generated from the dystrophin (
DMD
; Xp21.2-21.3; 14 kb),
apolipoprotein B (
APOB
; 2p23-24; 14 kb), and mucin (membrane-associated gly-
coprotein) genes
MUC2
,
MUC5AC
,
MUC5B, MUC6
(11p15.5; 15-18 kb),
MUC3
(7q22; 17 kb), and
MUC4
(3q24; 18-24 kb) (Debailleul
et al
., 1998).
Only ~10% of human genes encode proteins of known function. The
sequences of many of these genes can be found in GenBank
(
http://www.ncbi.nlm.nih.gov/
). In addition, hundreds of thousands of
'expressed sequence tags' (ESTs; Gerhold and Caskey, 1996) have been character-
ized which together represent a nonredundant set of >45 000 human genes
(UniGene;
http://www.ncbi.nlm.nih.gov/UniGene/index.html
).
Precisely what constitutes a gene is somewhat contentious (Epp, 1997) but a
crude working definition might be:
a transcription unit plus associated regulatory
sequences which together serve to specify both the sequence and the expression pattern of a
protein product
. The term 'gene' cannot, however, be restricted to protein coding
sequences since some genes (e.g. snRNA, rRNA, tRNA,
XIST
,
H19, IPW
) encode
RNA molecules with a variety of biological functions and which are not translated
into protein. A simple universally applicable definition of a gene is difficult to
derive owing to the existence of exceptions to almost any rule that one might
devise. Thus, some transcription units may encode multiple unrelated proteins
with different functions as a result of
alternative splicing
(see section 1.1.2,
Sequence
motifs involved in mRNA splicing and processing
; Figure 3.1 in Chapter 3). As a con-
sequence, the notion of a gene becomes somewhat elastic. Some genes occur
within the introns of other genes, for example
OMG, EVI2A, EVI2B
within the
neurofibromatosis type 1 (
NF1
; 17q11.2; Viskochil
et al
., 1991) gene,
F8A
within
the factor VIII (
F8C
; Xq28) gene (Levinson
et al
., 1990), and U21 within the L5
genes of chickens and mammals (Qu
et al
., 1994). The genes of most known verte-
brate small nucleolar mRNAs (snoRNAs) are located within the introns of other
genes (Maxwell and Fournier 1995), two human examples being the
RNE1
and
RNE2
genes which are located within the mitotic regulator (
CHC1
; 1p36.1) and
the 67 kDa laminin receptor (
LAMR1
; 3p21.3) genes, respectively. The realiza-
tion that some genes reside within the introns of other genes makes the concept of
the gene that much more diffuse.
Human genes can also overlap in a number of different ways. Thus, two genes
encoding
erbA
homologues,
ear
-1 (
THRAL
) and
ear
-7 (
THRA
), located at the
same locus on chromosome 17, possess overlapping exons but are transcribed
from opposite DNA strands (Miyajima
et al
., 1989). Similarly, the tenascin-X
(
TNXA
; 6p21.3) gene overlaps with the last exon of the cytochrome P450
(
CYP21
; 6p21.3) gene on the opposite DNA strand (Speek
et al
., 1996). Other
examples of overlapping human genes transcribed from opposite DNA strands
are provided by the
PMS2
(7p22) gene and a gene encoding a 34.5 kDa polypep-
tide (Nicolaides
et al
., 1995), the CD3
(
CD3Z
) and Oct1 transcription factor
(
POU2F1
) genes on 1q22-q23 (Lerner
et al
., 1993), and the cytochrome
c
oxidase
subunit X (
COX10
) gene and a partially characterized cDNA (C170RF1) on
chromosome 17p12-p11.2 (Kennerson
et al
., 1997). An example of overlapping
/
/