Biology Reference
In-Depth Information
Once data have been prepared for analysis, the data mining discipline offers a range of
techniques and algorithms for the automatic recognition of patterns in data. Depending on
the goals of the study and the nature of the data, these techniques have to be applied dif-
ferently. Data mining techniques provide a robust means of evaluating the generalization
power of extracted patterns in unseen data, although these must be further validated and
interpreted by the domain expert [236,237] . Machine learning is the most representative task
of many data mining applications [237,238] . It is essentially a set of computer programs that
make use of sampled data or past experience information to provide solutions to a given
problem. The most broadly applied machine learning types are supervised learning and
unsupervised learning [239-241] . Supervised classification, also known as class prediction,
is a key topic in the machine learning discipline. Its goal is to construct a function (or model)
to accurately predict the target output of future cases whose output value is unknown.
Supervised classification starts with a set of training data that consists of pair of input cases
and desired outputs to derive a predictive model and the model then can be subsequently
used to predict the outcome of unknown samples [242] . Supervised classification techniques
have been shown capable of obtaining satisfactory results in toxicogenomics [243-245] . On
the other hand, unsupervised learning normally involves clustering, that is, the partitioning
of samples into subsets (clusters) so that the data in each cluster shows a high level of prox-
imity [246] . A wide range of machine learning methods have been proposed by the data min-
ing community in recent decades. Excellent reviews in the literature are available outlining
their use in high-throughput genomic data analysis [101] and [247-251] .
The bioinformatics community has 'borrowed', customized and developed a large num-
ber of applications and resources during the last few years in order to deal with the data
produced by the high-throughput technologies. These include complex statistical methods
and integrated analysis approaches for identifying and classifying patterns of gene expres-
sion changes, and some of these have been extensively employed for correlating gene expres-
sion profiles to toxicity [252-255] . Although it is still difficult to find singular off-the-shelf
software that meets all the needs for toxicogenomics, many free and commercial software
and bioinformatics tools serve in a superior manner to discover biologically meaningful
knowledge. Readers can easily find a rich collection of resources showing how to conduct
microarray data analysis [256-258] . As drug development companies are incorporating high-
throughput technologies for safety assessment, so too the FDA is developing its own tools
for review purposes. For example, the FDA has developed ArrayTrack™, a comprehensive
microarray data management, analysis and interpretation system ( http: // www.fda.gov /
ScienceResearch / BioinformaticsTools / Arraytrack / default.htm . Last accessed on Feb. 25, 2013)
to support the FDA's Voluntary Genomics Data Submission (VGDS) program.
6.8 T OXICOGENOMICS IN REGULATORY APPLICAT ION
By recognizing the impact of genomics and other 'omics' technologies on drug development
and eventually on the regulatory process, the FDA coined the concept of 'safe harbor' in early
2000, to describe a novel way to share information between the FDA and external scientists
(e.g., industry scientists, academic researchers) [259] . The concept evolved into a VGDS mecha-
nism and is defined in the FDA's Guidance for Industry: Pharmacogenomics Data Submission
 
Search WWH ::




Custom Search