Open source software for mass spectrometry and metabolomics - Open Source Software in Life Science Research

Biomedical Engineering Reference

In-Depth Information

are spiked into the solvent used to dissolve freeze-dried plant material. The

reason for this is to ensure there is a constant amount of standard in each

sample so that instrumental response may be normalised. This avoids

amplitude-based errors such as instrument drift, sample dilution or

concentration. A KNIME workfl ow (Figure 4.14) can be created that

identifi es internal standards, averages the values and then divides each row

in the original data by the internal standard. Although KNIME has many

nodes for data manipulation, as yet there are none that allow mathematical

functions to be applied to rows or columns within a data set so a custom

R node (Labelled 'R-Snippet') can be used in order to do the division.

In cases where internal standards are not available several other

methods are possible, one of the most common being total signal

normalisation where each observation is divided by the total signal for

that observation. In this way dilution effects may be eliminated. As with

all normalisation methods, it is helpful to study replicate or pooled

samples to see the effect of normalisation. If correctly normalised these

samples should cluster into a tight group.

The R code for the Internal Standard node is shown below.

>intstd<-R[28,8:35] # get int_std row

>mdata<-R[,8:35] # get numerical part of data frame

>normalised<-sweep(as.matrix(mdata),2,as.matrix(intstd),'/')

>R<-cbind(R[1:7],normalised) # recombine ID's with data and output

>intstd<-R[28,8:35] # get int_std row

>mdata<-R[,8:35] # get numerical part of data frame

>normalised<-sweep(as.matrix(mdata),2,as.matrix(intstd),'/')

>R<-cbind(R[1:7],normalised) # recombine ID's with data and output

4.7 Open source software for

multivariate analysis

Metabolomics data consist of very large numbers of variables and

relatively few observations. Such data are inherently co-linear, which leads

to the use of chemometric techniques that can handle highly correlated

data by using latent variable methods [34]. These methods [35] include

principal components analysis (PCA), principal components regression

(PCR), Projection to latent structures (PLS), PLS discriminant analysis

(PLS-DA), orthogonal PLS (OPLS®) [36, 43, 44], orthogonal PLS

discriminant analysis (OPLS-DA®) [37] and kernel OPLS (K-OPLS) [38].

Once the data have been formatted and normalised, it is commonly

analysed interactively in a commercial multivariate analysis package.

However, the world of open source does offer some multivariate

tools, mainly in the R language. There are several chemometrics packages

for R.

Search WWH ::

Custom Search

Home