Biomedical Engineering Reference
In-Depth Information
Assertions that are confi rmed by the community via frequent citation or by
recognized biomedical and chemical resources should be given the highest
evidence score. At the other end of the evidence spectrum are inferred asser-
tions, such as computer predictions based on reasoning algorithms. New nano-
publications internally generated by inferencing algorithms over the updated
triple stores or through other forms of server side reasoning will have a lower
evidence score and will need human verifi cation. For massive distributed triple
inference, the Large Knowledge Collider (LarKC) consortium uses the Massive
RDF Versatile Inference Network (MaRVIN) [26], which emphasizes scal-
ability through parallelization of the execution of an open set of software
components. The system works as a scalable workfl ow engine for reasoning
tasks. In each workfl ow, there are several components (plug-ins), which are
responsible for diverse processing tasks, for example, identifying relevant data,
transforming data, selecting data, and reasoning over data. The execution of
the workfl ow is overseen by a decider plug-in [27]. New complex semantic
relationships can be queried and discovered through traversing a sequence of
links among the entities of interest. For the OPS goals, it will be necessary to
include the integration of weightings within the inference rules, to refl ect the
reliability of the source data. In this way, both false-positive and false-negative
relationships can be mitigated by considering only higher confi dence or mul-
tiple layers of evidence. A mechanism will translate the internal reasoning into
some unifying representation language. The calculated reliabilities are kept in
a separate store; scientists will be free to use the automatically calculated
evidence scores or to calculate their own evidence scores with measures more
suitable for their purposes. As well as the necessity for trustworthiness, the
system must be able to react quickly as triples may be undergoing constant
change through daily or even hourly updates. When new evidence arrives for
any assertions, all linked assertions must be refamiliarized with the existing
knowledge to interpret the latest fi ndings. The ability to continuously compare
and revisit hypotheses is crucial. The fi nal result will allow exploratory query-
ing supporting investigations where one does not initially know precisely what
one is looking for but rather uses approaches that permit discovery.
Early estimates by the Open PHACTS consortium members, based on the
experience of LarKC and the current size of the Linked Life Data store [18],
are that the current number of nanopublications in candidate OPS resources
is of the order of 10 14 while the removal of redundancy may reduce this amount
to roughly 1-200 billion unique assertions. With these numbers, the benefi t of
a massive reasoning system is clear; due to the fact that our conceptualizations
of biology have grown in size and complexity, even experts cannot have a wide
enough overview of known relationships to be able to make inferences over
potentially different disciplines without an automated system. Since the OPS
will include an extraction service and in-text semantic support to generate new
content in nanopublication format, while guarding the provenance data to
enable proper citation and linking back to the original source, the added value
of nanopublications generated from traditional texts and database records in
Search WWH ::




Custom Search