Toxicogenomics – A Drug Development Perspective - Genomic Biomarkers for Pharmaceutical Development

Biology Reference

In-Depth Information

Once data have been prepared for analysis, the data mining discipline offers a range of

techniques and algorithms for the automatic recognition of patterns in data. Depending on

the goals of the study and the nature of the data, these techniques have to be applied dif-

ferently. Data mining techniques provide a robust means of evaluating the generalization

power of extracted patterns in unseen data, although these must be further validated and

interpreted by the domain expert [236,237] . Machine learning is the most representative task

of many data mining applications [237,238] . It is essentially a set of computer programs that

make use of sampled data or past experience information to provide solutions to a given

problem. The most broadly applied machine learning types are supervised learning and

unsupervised learning [239-241] . Supervised classification, also known as class prediction,

is a key topic in the machine learning discipline. Its goal is to construct a function (or model)

to accurately predict the target output of future cases whose output value is unknown.

Supervised classification starts with a set of training data that consists of pair of input cases

and desired outputs to derive a predictive model and the model then can be subsequently

used to predict the outcome of unknown samples [242] . Supervised classification techniques

have been shown capable of obtaining satisfactory results in toxicogenomics [243-245] . On

the other hand, unsupervised learning normally involves clustering, that is, the partitioning

of samples into subsets (clusters) so that the data in each cluster shows a high level of prox-

imity [246] . A wide range of machine learning methods have been proposed by the data min-

ing community in recent decades. Excellent reviews in the literature are available outlining

their use in high-throughput genomic data analysis [101] and [247-251] .

The bioinformatics community has 'borrowed', customized and developed a large num-

ber of applications and resources during the last few years in order to deal with the data

produced by the high-throughput technologies. These include complex statistical methods

and integrated analysis approaches for identifying and classifying patterns of gene expres-

sion changes, and some of these have been extensively employed for correlating gene expres-

sion profiles to toxicity [252-255] . Although it is still difficult to find singular off-the-shelf

software that meets all the needs for toxicogenomics, many free and commercial software

and bioinformatics tools serve in a superior manner to discover biologically meaningful

knowledge. Readers can easily find a rich collection of resources showing how to conduct

microarray data analysis [256-258] . As drug development companies are incorporating high-

throughput technologies for safety assessment, so too the FDA is developing its own tools

for review purposes. For example, the FDA has developed ArrayTrack™, a comprehensive

microarray data management, analysis and interpretation system ( http: // www.fda.gov /

ScienceResearch / BioinformaticsTools / Arraytrack / default.htm . Last accessed on Feb. 25, 2013)

to support the FDA's Voluntary Genomics Data Submission (VGDS) program.

6.8 T OXICOGENOMICS IN REGULATORY APPLICAT ION

By recognizing the impact of genomics and other 'omics' technologies on drug development

and eventually on the regulatory process, the FDA coined the concept of 'safe harbor' in early

2000, to describe a novel way to share information between the FDA and external scientists

(e.g., industry scientists, academic researchers) [259] . The concept evolved into a VGDS mecha-

nism and is defined in the FDA's Guidance for Industry: Pharmacogenomics Data Submission

Search WWH ::

Custom Search

Home