Biomedical Engineering Reference
In-Depth Information
9.2.2 The bioinformatics landscape for handling
'omics data
The basic pipeline for analyzing and interpreting 'omics data starts with
raw data fi les and ends with some kind of knowledge. In between, we
have data QC, data reduction, analytics, annotation and interpretation
and perhaps a whole host of other smaller steps. This results in a large
landscape of bioinformatics applications, which cater to the various steps
in the data pipeline. We can break down the users of these applications
into three crude bins. The fi rst are the computational experts who are
familiar with programming and can script highly custom code to process
complex data sets. These users usually use open source solutions,
although they are also very savvy with the vendor tools out there. Second,
there are the power users, who include computational biologists who do
not know how to program and also computationally adept scientists who
are willing to learn complex software. These users tend to work on this
kind of data a lot, and use tools such as Genespring and Expressionist
routinely for project data analysis and support. They tend to work on
one data set at a time, and are less concerned with mining across a large
compendium of data. Hence, the solutions in this space tend to be very
project-focused and are great at getting useful knowledge from one data
set. The last category of users are the average bench scientists and
managers. They do not have time to learn a lot of complex software, and
need to know the answers to simple questions. They want to make queries
on data that have already been generated and analyzed either inside or
outside the company. They want to verify what those data mean, and be
presented with the data in a form that can be easily interpreted such as a
graphical output. These user communities are summarized in Figure 9.2,
along with some typical applications that cater for each community.
￿ ￿ ￿ ￿ ￿
9.2.3 Requirements for an 'omics portal
The EBI Gene Expression Atlas was constructed as a large-scale portal
for publicly available functional genomics data. Public domain 'omics
data sets originate from a variety of sources; the ArrayExpress Archive of
Functional Genomics contains data sets performed on more than 5000
different platforms coming from hundreds of different laboratories.
Naturally, raw data come in a great variety of formats, whereas meta-
data, that is experiment description representation can be highly
idiosyncratic, oftentimes incomplete or even missing. Signifi cant efforts
 
Search WWH ::




Custom Search