Biology Reference
In-Depth Information
different gene sequences (Fields et al ., 1994). Thus, the smallest human chromo-
some, 21, may contain as many as 2000 genes.
It is now known that most genes in higher organisms are not contiguous but
rather are a complex mosaic of protein coding ( exon ) and intervening non-coding
( intron ) sequences ( Figure 1.1 ). The exons represent that portion of the gene which
encodes the amino acid sequence of the protein product plus 5
noncoding
regions. Initially, both exons and introns are transcribed into mRNA but the
intronic portion is ultimately removed during mRNA maturation by a process
known as splicing (see section 1.1.2, Sequence motifs involved in mRNA splicing and
processing ). The mature mRNA is then translated into the amino acid sequence of a
protein on the ribosome. Although the central dogma of molecular biology was
therefore once summarized as 'DNA makes RNA makes protein,' the reverse flow
of genetic information is also possible by reverse transcription of RNA into DNA
(copy DNA or cDNA).
Each individual gene differs not only with respect to its DNA sequence speci-
fying the amino acid sequence of the protein it encodes, but also with respect to
its structure. A few human genes are devoid of introns (e.g. thrombomodulin
( THBD ) which spans 3.7 kb) whereas others may possess a considerable number,
for example 79 in the 2.4 Mb dystrophin ( DMD ) gene and as many as 118 in the
and 3
1(VII) collagen ( COL7A1 ) gene (Christiano et al ., 1994). Introns may be classi-
fied according to whether they interrupt the reading frame of the encoded pro-
tein. Thus phase 0 denotes that the intron lies between two codons, phase 1
between the first and second nucleotides of a codon, and phase 2 between the
second and third nucleotides of a codon.
Some introns may be huge as in the case of the first intron of the human
COL5A1 gene (~600 kb; Takahara et al ., 1995). The average length of a vertebrate
intron has been estimated to be ~620 bp (Hawkins, 1988) but introns separating
exons preceding the coding ones are often rather longer with an average length of
>1800 bp (Hawkins, 1988). This suggests that evolution may sometimes have had
to trawl quite far upstream of a gene to recruit appropriate DNA sequence motifs
to act as promoter/regulatory elements within the 5
untranslated region. The
average length of an internal exon is ~140 bp (Hawkins, 1988) but this average
figure conceals some very large exons, for example in the human factor VIII ( F8C ;
Xq28) [3106 bp], apolipoprotein B ( APOB ; 2p23-p24) [7572 bp] and mucin 5B
( MUC5B ; 11p15.5) [10 690 bp] genes.
Flanking
region
Exon 1
Exon 2
Exon 3
Flanking
region
i
ATG
TAA
Intron I
Intron II
5'
UTR
3'
UTR
*
5'
3'
GT
AG
GT
AG
GC
box
CAAT
box
TATA
box
Initiation
codon
Transcriptional
initiation site
Stop
codon
AATAA
Poly(A)-
addition site
GC
box
Figure 1.1 Schematic structure of an archetypal human protein-coding gene. UTR,
untranslated region.
 
Search WWH ::




Custom Search