Biology Reference
In-Depth Information
NaOH and load the sample into an Illumina Genome Analyzer flowcell. Hybridize
the complementary end of each template to a flowcell primer. Perform the first
extension by Taq polymerase to generate a reverse complementary copy that is
tethered to the flowcell surface. Remove the not-tethered original template by
flushing with NaOH. Hybridize the free end of the complementary copy strand to
another flow cell primer and perform extension by Bst polymerase, generating a
double-stranded product. Denature the double-stranded product by formamide and
then the free ends of two single strands can anneal to another set of two flowcell
primers, respectively. Repeat cycles of denaturation, annealing, and extension to
generate a cluster of 1000 double-stranded products. Cleave one of the flowcell
primers and remove one strand selectively, resulting in clusters of single-stranded
templates. This allows more efficient hybridization of the sequencing primer and
ensures that the sequencing occurs only in one direction. Perform the sequencing by
synthesis, 36 cycles of single-base extension, in the Illumina Genome Analyzer
according to the manufacturer ' s instruction, using a modified DNA polymerase
and a mixture of four dNTPs that are labeled by four different fluorophores and
also 3 0 blocked. In every cycle, the fluorescence signal corresponding to the identity
of incorporated nucleotide is imaged and then the fluorophore and the 3 0 blocking
moiety are cleaved for the next cycle. Export the raw sequence data.
K. Computational Data Analysis
Analysis of the raw sequencing data presents intense computational challenges
and the methods often change based on newly proposed algorithms ( Creighton et al.,
2009; Shendure and Ji, 2008 ). The first step is to filter out unusable reads from the
raw data. For example, unique sequence reads of fewer than 10 copies may be
considered as potential sequencing errors and be discarded. Sequence reads that
match E. coli genome database are considered as contaminations and should also be
removed. In searching the small RNAs with a size around 17-26 nt in length, given
that the average length of Illumina read is 36 nt, finding part of the 3 0 -adaptor in the
3 0 -end of the read sequence can also be used as a quality control step, while this may
not apply to all noncoding RNAs.
For profiling expression of known miRNAs and 21U-RNAs, we trimmed the 3 0 -
adaptor sequence from the reads and used an in-house alignment for perfectly
matching known sequences from reference databases [miRBase ( Griffiths-
Jones et al., 2008 ) and previously annotated 21U-RNAs ( Batista et al., 2008 )].
The number of known small RNA reads is normalized to the total number of reads
that matched the C. elegans genome (see below) and that can represent small RNA
abundance. For alignment of large sets of short reads to genome databases, an
increasing number of software tools have been developed, which also allow for
mismatches and/or gaps (for review see Shendure and Ji, 2008 ). We loaded the
adaptor-trimmed reads to the SOAP (short oligonucleotide alignment package) ( Li
et al., 2008, 2009 ) for matching the C. elegans genome [WormBase ( Harris et al.,
Search WWH ::




Custom Search