Biomedical Engineering Reference
In-Depth Information
Medical Centre [4]. This tool is used to identify and normalise the terms
so that they can all be referenced as one. For instance, a gene may be
known by various terms, including:
name - e.g. 'Vascular Endothelial Growth Factor A';
offi cial symbol - e.g. 'VEGFA';
EntrezGene ID - 'EntrezGene:7422';
other - e.g. 'VEGF-A', 'vascular permeability factor', 'vegf wt allele'.
Each of these synonyms will be mapped to the same entity. In addition,
attempts are made to make sure that a synonym does not accidentally
match another word. For instance, the 'Catalase' gene is also known as
'CAT', but this also matches many other things (including cat the animal,
cat the company that makes diggers, cat cabling, and cat - Cambridge
Antibody Technology - company acquired by AstraZeneca in 2006).
14.4.4 Creating SOLR index
This normally takes place in an overnight process. Although, technically,
the SOLR system can cope with dynamic updates, it was found that this
process is best performed while the index is offl ine. In future, it is
envisaged that a dedicated SOLR server will be used to index new data,
and the indexes swapped over at night, to improve up-time, and the
number of documents that can be processed per day.
14.4.5 Enhanced meta-data
￿ ￿ ￿ ￿ ￿
In addition to in-pipeline annotations, further data enhancement
processes have been developed to allow extra annotations to be performed
on the data, using the power of the SOLR index. For instance, a set of
SOLR queries were developed to identify if any of the documents
mentioned any standard chemical reactions (e.g. 'Baeyer-Villiger
Oxidation', 'Schotten-Baumann Reaction'), and the relevant documents
tagged with that information. This tagging process takes approximately
fi ve minutes for about 100 different reactions. Unfortunately, as this
updating is performed on the SOLR index directly, any updates from the
original source results in these annotations being over-written - which is
why a possible enhancement to the system being considered is to tie the
index into a database to hold the extra annotations.
 
Search WWH ::




Custom Search