Open source software for mass spectrometry and metabolomics - Open Source Software in Life Science Research

Biomedical Engineering Reference

In-Depth Information

the peaks are aligned using the RANSAC [28] Peak aligner, which is a

method to join all the separate peak lists into one master list, accounting

for both linear and non-linear deviations in retention time. An alternative

is the simple Join aligner, which uses mz and RT windows. Because many

small, possibly spurious, peaks may be detected in single runs, the

combined table can be constrained to entries where there are a minimum

of, say, 20 occurrences of that peak. Figure 4.10(c) shows how this is

confi gured in mzMine. Finally (Figure 4.10(d)), in order to identify the

peaks, a custom database of accurate mass/retention times measured on

standard compounds is used. This library is simply a comma separated

value fi le (.CSV) listing the mz, RT, molecular formula and name of each

metabolite. The retention times are determined by the previous injection

of standard samples onto the system. There are also options to search

online databases such as ChemSpider, KEGG, METLIN, etc., but the hits

are often rather promiscuous returning many research chemicals, drugs

and mammalian metabolites. These may be irrelevant and misleading

when the experiment concerns a limited, defi ned space, such as plant

metabolites for example.

Once all the stages are confi gured satisfactorily it is possible to run the

operations in batch mode. This can take some time and having a multicore

processor is useful as mzMine is multithreaded. For the small example

data set illustrated, this operation took approximately 5 minutes (PC =

HP Zeon Z600 8- core 2.4 GHz workstation with 8 GB RAM running

Windows 7, 64 bit). It is not uncommon in our laboratory to run analyses

that take many hours of overnight operation for a typical metabolomic

study. The fi nal step in the workfl ow is an Export to CSV option that

allows the export of the fi nal spreadsheet for downstream analysis.

The end result of the data processing workfl ow is shown in Figure 4.11

'RANSAC Aligned min 20 peaks'. Peaks that are missing are shown as red

spots in the table (shown boxed in the fi gure). As missing data is undesirable,

mzMine can be confi gured to fi ll missing peaks using the regions defi ned in

the peak table. This ensures a reading of real data which is preferable for

later statistical analysis. mzMine has two main gap-fi lling options; 'Peak

fi nder' and 'm/z and RT range gap fi ller'. The former looks for undetected

peaks in the same region as other scans, whereas the mz and RT gap fi ller

simply fi nds the highest data point within the defi ned range.

At this stage it is most likely that the data will be processed further in

a commercial data analysis package, but there are a few basic data

visualisation tools included in mzMine. Analysis options include:

coeffi cient of variation (CV) analysis, log ratio analysis, principal

component analysis, curvilinear distance analysis, Sammon's projection

Search WWH ::

Custom Search

Home