Biomedical Engineering Reference
In-Depth Information
clinicians supporting oncology research. Around 50 studies of interest
were selected based on key cancer types, treatments or systems of
interest to the research unit. Although a number of these studies were
found to be in the public Atlas, the remaining studies where expression
data and meta-data were available from public sources were curated
in-house.
Atlas includes an administrative utility page that allows experiments to
be loaded in MAGE-TAB format, which is the preferred format for data
submission to the EBI Array Express database. The MAGE-TAB Sample
and Data Relationship Format (SDRF) fi le allows specifi cation of
characteristics describing sample properties. These typically include
general properties of samples that may hold for all samples in the
experiment (such as tissue type) as well as factors under study, such as
treatment or clinical outcome. In addition, the Experimental Factor
identifi es the primary variables of interest in the study, and these variables
are used to drive the analytics in Expression Atlas.
Many of the public studies submitted to Array Express specify the key
sample and experiment meta-data through Comment or Characteristics
fi elds in the SDRF fi le. So for these types of studies, study curation can be
achieved by formatting or transforming the key variables of interest as
Experimental Factors in the SDRF fi le. Once these fi les are parsed and
validated by the Atlas loader, the analytics workfl ow is automatically
triggered to identify genes having signifi cant association with the variables
identifi ed. A number of public studies include only some of the meta-data
for the experiment, or even none at all! In these cases, original publications
or other online resources have to be consulted, and data curation becomes
more time-consuming or problematic.
One approach to loading internal data would be to export local
expression studies of interest in MAGE-TAB format [7] for loading by
the sample loader. This has proven to be feasible, although some manual
curation is still required to identify the key experimental factors in a
given study and to map meta-data terms used on local systems to the
Experimental Factor Ontology used in the Expression Atlas. If all data
for a given public or internal study are available in electronic format,
curation of a typical experiment having 20-50 samples can generally be
accomplished in an hour or two. Studies requiring re-entry of data from
publications or other non-electronic sources will require considerably
longer to properly curate, if, in fact, the meta-data can be obtained. Of
course, a number of other more general considerations of quality have to
be taken into account when entering internal and external studies,
including experimental design and quality of samples and data fi les.
￿ ￿ ￿ ￿ ￿
 
Search WWH ::




Custom Search