Target resequencing strategies
For some applications, it would be not necessary to sequence the whole genome, but sequence specific region or regions. This is the case of the study of: i) a disease phenotype previously mapped to a specific region of the genome, ii) candidate genes involve in a pathology or pathway, iii) whole exome. To reach these purposes it is necessary the combination of methods for targeted capture with massive parallel sequencing. Methods for capturing the regions of interest are commercially available, but it is important to remind that, due to this field is in continuous and rapid evolution, before designing any experiment it will be necessary to check for latest approaches, in order to choose the more cost-effective strategy for each project (Turner et al., 2009; Mamanova et al., 2010). Even considering the different capture strategies, the workflow for targeted resequencing for either candidate genes or exome sequencing is very similar. Genomic DNA is used to construct a library, which consists in small fragments of DNA flanqued by adaptors. Depending on the method used for capturing the regions of interest, the capture occurs before or after creating the library. Once the capture library is created, is clonally amplified followed by massive parallel sequencing.
During the process of capturing and library preparation it is possible to barcoding samples. This process enables the user to pool multiple samples per sequencing run, taking advantage of the high-throughput of the NGS platforms.
Capture strategies can be broadly grouped in two main groups, the first one is based on PCR, and the second one in the use of hybridization probes (Table 3). 1. PCR approaches:
When a specific region has been previously mapped, long-PCRs using high-fidelity polymerases are used to analyze large kilobase-sized contiguous intervals (Yeager et al., 2008).
Different strategies for amplified simultaneously hundreds of fragments of DNA have been developed over the last years. Access Array System (Fluidgm) uses a microfluidic chip with nanoliter scale chambers, where the simultaneous amplification of 48 different fragments in 48 samples is performed. By incorporating the adaptor sequences into the primer design the amplicon product is ready to go directly into clonal amplification (Voelkerding et al., 2010).
Microdroplet-PCR technology developed by RainDance involves the use of emulsion PCR in a microfluidic device, creating droplets of primers in oil solution. The primer droplets that are targeted to different regions of the genome merge with separate droplets that contain fragmented genomic DNA and PCR reagents. These mixed droplets are thermal cycled in a single tube. The encapsulation of microdroplet PCR reactions prevents possible primer pair interactions allowing an efficient simultaneous amplification of up to 20,000 targeted sequences (Tewhey et al., 2009). Illumina and Life Technologies have followed similar strategies for capture regions for MiSeq and Ion PGM Sequencer, respectively. Illumina has launched the TrueSeq Custom Amplicon Kit for multiplex amplification of up to 384 amplicons per sample, and Life Technology has recently developed a multiplex PCR for amplified in a single tube up to 480 known as Ion AmpliSeq Cancer Panel. Currently, only the cancer panel is available, but it has been announced by the company that custom panels will be early available.
Halo Genomics has developed two different strategies based on amplification methods, Selector and HaloPlex. The first one, Selector Target Enrichment system is based on multiple displacement amplification. This strategy produces circular DNA that is amplified in a whole genome amplification reaction. The resulting high molecular DNA product is compatible with all next generation sequencing library preparation protocols. For achieving this, DNA sample is first fragmented using restriction enzymes, secondly the probe library is added and the probes hybridize with the targeted fragments. Each probe is an oligonucleotide designed to hybridize to both ends of a targeted DNA restriction fragment, thereby guiding the targeted fragments to form circular DNA molecules. The circular molecules are closed by ligation and then amplified. Next step is library preparation (Johansson et al., 2010).
In the case of HaloPlex technology, PCR products are ready for pooling and direct sequencing, it is not necessary to create the library after the capturing because the probes also contain a specific sequencing motif that is incorporated during the circularization. This motif allows the incorporation of specific adaptors and barcodes during the amplification. Currently, this product is optimized for Illumina. 2. Hybridization
Other strategy is capture by hybridization of specific probes complementary of the regions of interest. The first hybridization approaches were based on-array capture (Albert et al., 2007; Hodges et al., 2007; Ng et al., 2009). But to avoid the disadvantages of working with microarrays, currently methods are based in-solution capture. Fragment libraries are hybridized to biotinilated probes in solution and subsequently recovered with streptavidin-magnetic beads, amplified and sequence in the platform of choice (Gnirke et al., 2009; Bamshad., 2011).
All the vendors (Agilent, Nimblegen, Illumina and Life Technologies) offer kits either predesigned for specific application such as exome sequencing, cancer, etc or custom panels to be designed for the user (Table 3). There are different kits for different sizes of the region of interest that go from less than 100kb to up 60 Mb.
a Custom (1), specific gene panel (ej. cancer panel) (2), exome panel (3) b 454 (1), Illumina (2), SOLiD (3), Ion PGM Sequencer (4). c Custom early available
Table 3. Capture methods for targeted resequencing.
Candidate gene resequencing
In dealing with arrhythmogenic diseases at risk of sudden cardiac death, we can analyze those genes previously associated with the pathologies that explain a high percentage of cases, variable according to the pathology (Hedley et al., 2009ab; Kapplinget et al., 2009).
Therefore, as it was already used for SCD associated cardiomyoapties (Meder et al., 2011), the strategy with the arrhythmogenic diseases could be to capture the 21 genes mentioned above in Table 1. As it is shown in table 3, there are a great variety of strategies available. In addition, all commercially available kits have developed tools for designing specific primers or probes to capture the regions of interest.
For selecting both the capture method and the NGS platform many factors have to be evaluated: size of the region of interest, the coverage and accuracy needed, the number of samples and barcodes availability and DNA requirement. There is no an ideal method for all the situations.
Whole exome resequencing
The targeted resequencing of the subset of the genome that is protein coding is known as exome sequencing. This strategy is been a powerful approach for either identifying genes involve in Mendelian disorders or rare variants underlying the heritability of complex traits (Bamshad, 2011). Therefore, arrhythmogenic diseases such as the LQTS, the SQTS, the CPVT or the BrS, all genetic diseases with Mendelian inheritance, are appropriate candidates for this type of study.
All the vendors of in-solution hybridization methods have developed commercial kits for capturing whole exome. (Agilent, Illumina, LifeTechnologies, NimbleGen) (Table 3). Due to the throughput needed for obtaining enough coverage for variant calling, the platforms of choice for this application are Illumina GAII or superior and SOLiD 5500. This approach has been successfully used since 2009 in at least 29 diseases, in which the genes involved in the disorders have been identified (Bamshad, 2011).
Genetic variant versus mutation
It should be kept in mind that this kind of genetic tests identifies the presence of a probable/possible arrythmogenic disease causing mutation for which the probability for pathogenesis and even the likelihood of sudden cardiac death is influenced by many factors, including rarity, conservation, topological location, co-segregation, functional studies, and so forth. According to Kapplinger et al. (2009), fewer than 25% of the previously published LQTS mutations have been characterized by heterologous expression studies to demonstrate the anticipated loss-of-function (LQT1 and LQT2) or gain-of-function (LQT3) conferred by the mutation. The rank of a new genetic variant detected in an afected individual as a pathogenic mutation must meet the following specifications:
a. The variant must disrupt either the open reading frame (i.e., missense, nonsense, insertion/deletion, or frame shift mutations) or the splice site (poly-pyrimidine tract, splice acceptor or splice donor recognition sequences). Considering the acceptor splice site as the 3 intronic nucleotides preceding an exon (designated as IVS-1, -2, or -3) and the donor splice site as the first 5 intronic nucleotides after an exon (designated as IVS+1, +2, +3, +4, or +5) (Rogan et al., 2003).
b. The variant must be absent in a representative cohort of healthy unrelated individuals with a minimum of 200 individuals and 400 alleles with a common population origin.
c. The variant must have been absent in all published databases listing the common polymorphisms in the studied genes and previously published reports or compendia of rare control variants.
Many of the possible new genetic variants described, although they meet the requirements listed above, may not have any pathogenic effect and the only real way to check would be through functional studies that prove this effect. Due to the difficulty in performing such studies in many of the functional proteins involved, during the last years several "in silico" tools have been created allowing us to infer the probability that a genetic variant is pathogenic or not. Unfortunately, different prediction algorithms use different information and each has its own strength and weakness. Since it has been suggested that investigators should use predictions from multiple algorithms instead of relying on a single one, Liu et al (2011) have developed dbNSFP (database for nonsynonymous SNPs functional predictions). It compiles prediction scores from four algorithms (SIFT, Polyphen2,LRT, and MutationTaster), along with a conservation score (PhyloP) and other related information, for every potential non synonymous variant in the human genome.
Despite the progress in knowledge of the mechanisms, risk factors, and management of SCD, it remains being a major public-health problem. One of the challenges is the accurate identification of the person at risk, especially in younger people where the sudden death is most of the times the first manifestation of the disease. Multimarker SCD risk scores including demographic, clinical and genetic variables should improve the identification of persons at risk (Adabag et al., 2010).
Although there are other processes affecting the electrical cardiac systole, pathologies considered in this topic are the familiar diseases with a clear genetic inheritance in which genetic diagnosis has a great relevance.
Capturing strategies followed by NGS allowed us to accurately detect arrhythmogenic disease causing mutations in a fast and cost-efficient manner that will be suitable for daily clinical practice of genetic testing. Nevertheless, we cannot forget the need to use additional strategies proving their disease causality.
Additional benefits of great value in these genetically and phenotypically heterogeneous disease are: 1) the ability to detect both, known mutations and novel mutations, 2) the possibility of screening only selected gene exons or all exons in the human genome, and finally 3) the ability to detect individuals with multiple mutations.