Biomedical Engineering Reference
In-Depth Information
quantifying mRNA expression in cancer represents the first application of
high-throughput genome-wide of DNA-microarray technology in biomedical
research [ 27 , 75 , 86 , 87 ]. With the introduction of microarray technology, it is now
possible to monitor genome-wide gene expression changes within a sample.
Microarrays are also being used to genotype SNPs by hybridizing the DNA of
individuals to arrays of oligonucleotides representing different polymorphic alleles.
The genome-wide SNP microarray has accelerated genome-wide association studies
(GWAS) over the last 5 years, and many loci that are associated with diseases have
been discovered and validated [ 37 , 96 ]. Another type of microarray known as array-
Comparative Genomic Hybridization (aCGH) is being used to detect genomic
structural variations in different cancer genomes [ 61 ]. The large-scale, systematic
sequencing studies conducted by applying massively parallel, next-generation
sequencing technologies open up new research avenues in cancer genomics.
Applications of these massively parallel sequencing platforms have led to the
identification of the full range of somatically acquired genetic alteration in cancer
via whole-genome or exome sequencing [ 2 , 72 , 77 , 81 , 82 , 94 ]. These include the
identification of genome-wide point mutations, insertions, and deletions, copy
number changes and genomic rearrangements in various cancers [ 2 , 72 , 77 , 81 , 82 , 94 ].
New computational and statistical tools are required to analyze and interpret
these large-scale data sets. The goal of microarray data analysis is to find the
connections between gene expression patterns within cancer cells and their differ-
ent phenotypes. There are two approaches for analyzing microarray gene expres-
sion data: (1) data-driven approach and (2) knowledge-driven approach.
Data-driven approach . The most straightforward method of analyzing microarray
gene expression data is the data-driven approach, in which the goal is to correlate
gene expression patterns with cancer phenotypes. The gene expression data
analyses can be accomplished by either unsupervised (clustering) [ 20 ]or
supervised (classifying) [ 85 ] algorithms. For the unsupervised approaches, compu-
tational algorithms are being used to identify substructures of gene expression
patterns underlying the data. Some of the most commonly used clustering
algorithms include hierarchical, k -means, principal component analysis, self-
organizing maps, and their variants [ 14 , 63 , 102 ]. Clustering of gene expression
data has identified the intrinsic subtypes of breast cancer, which is currently
being used to annotate breast cancer patients in clinic for tailoring their treatments
[ 75 ]. For the supervised approaches, statistical and machine learning algorithms
[ 45 , 46 , 55 ] are being employed to identify gene features that can distinguish
between cancer phenotypes, such as identifying diagnostic, predictive, or prognos-
tic gene markers.
Knowledge-driven approach . With the increasing knowledge of cancer and their
underlying biological pathways, several computational methods have improved the
ability to identify candidate genes that are correlated with a disease state by exploiting
the idea that gene expression alterations might be revealed at the level of biological
pathways or co-regulated gene sets, rather than at the level of individual genes
[ 52 , 62 , 68 , 69 , 78 ]. Such approaches are more objective and robust in their ability to
Search WWH ::




Custom Search