what-when-how
In Depth Tutorials and Information
TABLE 10.1 Glycine Codons in the Triple-Helical Regions of
the α1(1) and α2(1) Collagen Chains
TABLE 10.2
Reference sequences for COL1A1 and COL1A2
Locus
Reference
Genomic
(LRG)
Number and
Frequency in
α 1(1)
Number and
Frequency in
α 2(I)
GenBank
RefSeqGene
Genomic
GenBank
RefSeq
mRNA
GenBank
RefSeq
Protein
Glycine
Codon
Variant at
First G
Variant at
Second G
Gene
GGA
60 (0.18)
72 (0.21)
AGA→Arg
CGA→Arg
TGA→Stop
GAA→Glu
GCA→Ala
GTA→Val l
COL1A1
NG_007400.1
NM_000088.3
NP_000079.2
LRG_1
COL1A2
NG_007405.1
NM_000089.3
NP_000080.2
LRG_2
There are individual GenBank 12 reference sequences for the genomic DNA, mRNA
and protein for each gene. The LRG 13 record for each gene comprises a single record
with identical sequence information to that in the three individual GenBank records. In
addition, the LRG records contain mappings for alternate “legacy” exon and amino acid
numbering schemes.
GGC
94 (0.28)
70 (0.21)
AGC→Ser
CGC→Arg
TGC→Cys
GAC→Asp
GCC→Ala
GTC→Val l
GGG
10 (0.03)
16 (0.05)
AGG→Arg
CGG→Arg
TGG→Tr p
GAG→Glu
GCG→Ala
GTG→Val l
for a short period of time, numbered sequentially from
the 3′ end of the gene until the precise number of exons
was established. This exon numbering scheme was used
in very early publications describing disease-causing
sequence variants 7-9 but is not encountered nowadays.
Variants in COL1A1 and COL1A2 are recorded in the
Osteogenesis Imperfecta Variant Database and com-
prise data published in journals as well as data submit-
ted directly from research and diagnostic laboratories
to the database by named individuals ( https: // oi.gene.
le.ac.uk / ). 10,11 Most of the directly submitted variants are
unpublished and some are non-public at the request of
the submitters, but summary-level data are included,
where appropriate, in the analyses presented here.
Submissions which do not record a specific OI have been
excluded from consideration in the analyses which fol-
low. The variants which are referred to in this chapter are
included in the database as it existed in July 2012. The
recommended GenBank 12 and Locus Reference Genomic
(LRG) 13 reference sequences for the COL1A1 and COL1A2
genes, mRNAs and proteins are presented in Table 10.2 .
The GenBank sequence records are those currently used
in the database though these are directly equivalent to the
corresponding LRG reference records.
It is not intended that all sequence variants in the OI
database will be reviewed here as the list is ever increas-
ing. Instead, specific variants which illustrate key issues in
the genotype / phenotype relationship will be highlighted.
GGT
174 (0.51)
180 (0.53)
AGT→Ser
CGT→Arg
TGT→Cys
GAT→Asp
GCT→Ala
GTT→Val l
CODON AND EXON NUMBERING,
REFERENCE SEQUENCES AND
DATABASE ENTRIES
Before discussion of the sequence variants which
result in OI, it is necessary to deal with the related
issues of the numbering of the α-chain amino acids and
of the exons of the encoding genes as well as the refer-
ence sequences for the genes, the mRNAs and the pro-
teins, and also the database of OI sequence variants.
Prior to the introduction of guidelines for the number-
ing of amino acids when reporting genetic variants, 3 the
first glycine of the triple-helical region was designated
amino acid 1, ignoring the presence of the signal peptide,
the N-propeptide and the N-telopeptide in each chain.
Although the use of this numbering system is now dep-
recated for the reporting of genetic variants, it usefully
serves the purpose of aligning the numbering of corre-
sponding amino acids in the triple-helical regions of the
α1(I)- and α2(I)-chains. The first triple-helix glycines of
these α-chains are amino acids 179 and 91 of the respec-
tive primary translation products.
The exons which encode the triple-helical domains of
the α-chains of COL1A1 and COL1A2 are precise multi-
ples of 9 bp and are commonly, though not exclusively, 54
or 108 bp in length. 4,5 A 54 bp exon encodes precisely six
Gly-Xaa-Yaa amino acid repeats. Once the COL1A1 and
COL1A2 had been cloned and their sequence aligned, it
was apparent that the exon structure of the two genes
was virtually identical, but the former comprised 51
exons and the latter 52. COL1A1 has a single 108 bp
exon designated “33 / 34” or “33,34” which corresponds
to the individual 54 bp exons 33 and 34 in COL1A2 . 6
Otherwise, the exons of the two genes are numbered
sequentially, beginning at 1 from the 5′ ends of the genes.
It also should be noted that the exons of both genes were,
DELETIONS INVOLVING ONE
OR MORE EXONS
There are relatively few reported instances of large
deletions resulting either in the entire elimination of
COL1A1 and COL1A2 or of substantial portions of either
gene. Deletion of an entire gene would be predicted to
reduce synthesis of the protein encoded by the deleted
allele and would result in haploinsufficiency14 14 if a 50%
reduction in expression was insufficient to maintain bio-
logical function. Complete deletions in the close vicinity
of COL1A1 have been reported in six unrelated patients,
 
Search WWH ::




Custom Search