Mass Spectrometry Metabolomic Data Handling for Biomarker Discovery - Proteomic and Metabolomic Approaches to Biomarker Discovery

Biology Reference

In-Depth Information

on different concepts is available. Similarity

measures are

algorithm (or K-medoids, depending on the

statistic applied) is an iterative method that starts

with k cluster centers randomly chosen. All

observations are then associated to the closest

cluster center and new centers are computed as

the mean of the observations of a given cluster.

The observations are grouped with respect to

the new centers iteratively until convergence;

that is, no difference occurs in the next iteration. 78

The fuzzy c-means algorithm was introduced to

allow the association of an observation to more

than one cluster with a probability of belonging

to each cluster. 79

rst computed between observa-

tions, and between clusters once observations

begin to be grouped into clusters. Several

metrics, such as Euclidean and Manhattan

distance, correlation, or mutual information,

can be used to compute similarity. Additionally,

several merging strategies that lead to different

clustering patterns are possible. Clustering

results are therefore somewhat subjective, as

they greatly depend on the users

choices.

Traditional cluster analysis is usually performed

to group either observations or variables sepa-

rately but simultaneous co-clustering (or biclus-

tering) of the rows and the columns of the data

matrix constitutes also a suitable alternative to

search for biomarkers. 77

As it uses a hierarchical con

'

Regression and Classi

cation with

Supervised Methods

Unlike the aforementioned approaches,

supervised learning takes advantage of prior

information for the analysis of a set of observa-

tions. An outcome d the response d can be

observed or measured and the modeling process

aims at its prediction. This response can be quan-

titative in the case of regression or qualitative in

the context of classi

guration d a tree

called a dendrogramd to structure the data, hierar-

chical cluster analysis (HCA) is an intuitive way

to perform data clustering when the number of

clusters is unknown a priori . Each leaf corre-

sponds to an observation and the branching

re

ects the relation between clusters. Two dis-

tinct algorithms can be applied d agglomerative

(grouping observations) or divisive (dividing

the data set) d but in practice the agglomerative

approach is of wider use.

In this case, a linkage function de

cation. A training set is used

to build a model encapsulating general hypoth-

eses that depicts the relations between a set of

measured independent variables X and one or

more dependent responses Y. Several techniques

have been developed for that purpose, origi-

nating from the statistical, chemometric, or

machine learning background. Outputs of some

classical unsupervised and supervised modeling

methods are shown in Figure 5 .

nes the

criteria for evaluating distances between obser-

vations and clusters. At each iteration, the closest

objects are grouped to form a new cluster.

Alternatives to HCA often necessitate de

ning

the number of clusters a priori . The K-means

FIGURE 5 Typical data modeling outputs.

Proteomic and Metabolomic Approaches to Biomarker Discovery

Search WWH ::

Custom Search

Home