Biomedical Engineering Reference
In-Depth Information
incorporates cluster generation, paired-end fl uidics, sequencing by synthesis chem-
istry, and complete data analysis. An intuitive touch screen interface makes for
simple instrument operation. Plug-and-play reagents with RFID tracking make for
added convenience. MiSeq eliminates the need for auxiliary hardware and comput-
ing resources, saving valuable laboratory bench space. This system allows assembly
of small genomes or targeted gene panels with unmatched accuracy, especially
within homopolymer regions. It uses the shortest sample-to-data workfl ow among
all benchtop sequencers.
2.7
NGS Analysis Strategies
After obtaining the sequence data from the sequencing, NGS reads (short DNA
fragments) are aligned with the reference genome to fi nd the genetic variations
(Ruffalo et al. 2011 ). For bioinformaticians with the Linux system experience, there
are numerous open source aligners. A summary about the sources for downloading
various software packages is given at ( http://en.wikipedia.org/wiki/List_of_
sequence_alignment_software ) . In contrast, one stop commercial solutions for data
analysis have emerged (Softgenetics NextGENe) and some of them are web-based
cloud computational servers (e.g., www.dnanexus.com ). The depth of coverage can
be defi ned as the number of times each nucleotide is independently sequenced in
different reads (Bao et al. 2011 ). Generally, a large number of variant differences
between an individual's sequence and a reference sequence is expected to be
obtained after the NGS analysis is complete.
The next step of the analysis process aims at distinguishing the potential disease
causative variants from the benign SNPs by using a combined fi ltering approach
based on mutation type, previously identifi ed mutations, predictions of pathogenic-
ity, knowledge database searches, inheritance patterns, and phenotype consideration
(a detailed description is in Chap. 8 ). Generally, assumptions are made to identify
pathogenic mutations using NGS data. Mutations are assumed to have a higher
penetrance and mutations that are directly affecting protein structure will have func-
tional consequences that can be easily observed. Thus, mutation candidates are non-
synonymous mutations, insertions/deletions, and splice-site mutations. In addition,
common SNPs, found in healthy individuals, are fi ltered out of the analysis variant
set. To aid in the mutation identifi cation, a number of databases and software pack-
ages are used to determine the meaning of variants discovered by NGS (Bao et al.
2011 ). The Human Gene Mutation Database (HGMD; http://www.hgmd.cf.ac.uk/
ac/index.php ) and dbSNP ( http://www.ncbi.nlm.nih.gov/projects/SNP/ ) are queried
to determine if the variants have previously been detected or reported in the litera-
ture for a particular disease. SIFT ( http://sift.jcvi.org/ ) and polyPhen ( http://genet-
ics.bwh.harvard.edu/pph2/ ) are typically used to predict if a coding region change
will have a deleterious effect on protein structure and function. Due to the complex-
ity and amount of biological information, there are a number of knowledge-based
databases that aid in the interpretation of NGS variants, namely, Online Mendelian
Search WWH ::




Custom Search