Database Reference
In-Depth Information
(ii) descriptive information regarding the
biological source material, and the proto-
cols performed in the experiment (e.g. or-
ganism, age, treatments...)
A
be impossible to extract automatically the
terms describing each treatment protocol
and assign them automatically to the corre-
sponding sample records. The solution we
chose was to design an annotation editor
that:
serie record links together a group of re-
lated Samples and provides a focal point
and description of the study as a whole.
Proposes for each concept (e.g. in-
ductor, disease, allergy, subject, age,
cell...) all candidate-instances ex-
tracted from the MINiML file (e.g.
'nickel' for the concept 'inductor',
'contact dermatitis' for 'disease',
'nickel allergy' for 'allergy'...). The
extraction is automatic. Briefly, it is
based on a comparison of words in
the text with concepts labels in the
GEOnto ontology and instances in
existing annotations.
Allows the biologist to complete (by
While it is not possible to check automatically
if the MINiML information is accurate, the an-
notation process cannot be completely automated.
Indeed, information contained in the serie and the
platform records can be automatically extracted
to annotate the study as a whole, since these
parts are usually clearly inserted by biologists,
but the information contained in the descriptive
part of the sample records is not easy to process
automatically since biologists have no standard
syntax to conform.
Thus GEOnto relies on two annotating sub-
processes: an automatic one and a semi-automatic
one. They are described below (see Figure 12):
adding relevant terms that have not
been extracted) and structure the de-
scription of each sample record (ex-
perimental condition). For example,
a biologist will associate the use of
the inductor 'nickel' to the sample
'GSM144435' while the sample
'GSM144362', which uses an empty
patch test, will not be associated to an
inductor.
Builds
Automatic annotation: general information
on the study as a whole is automatically
extracted from the MINiML file using
XPATH (XPath is a language for address-
ing parts of an XML document). These data
such as title, contributors, PubmedID, type
of array, keywords, associated samples/ex-
perimental conditions... are used to gener-
ate automatically an RDF annotation based
on GEOnto.
Semi-automatic annotation: Information
automatically
RDF
annotations.
RDF annotations obtained from these two
sub-processes give a relevant and structured de-
scription, based on GEOnto, of what is important
to know about the experiment.
describing each sample record such as
subjects (type, age, disease, allergy...) and
treatments (inductor, delivery method,
delivery time...) is usually not well struc-
tured in the MINiML file. For example, the
treatment protocol (describing treatments
applied to the biological material prior to
extract preparation) is most of the time a
unique global description repeated for all
the sample records. Therefore, it seems to
Annotation of Statistic Analyses
and Data Mining Results
GMineAnnot takes a PMML file as input data
and generates semantic annotations about statistic
analysis and data mining results models based
on GMineOnto. These annotations focus on the
Search WWH ::




Custom Search