Biomedical Engineering Reference
In-Depth Information
any tags or key words that have already been attributed to the
document. For instance, information delivered by the MedLine fed
included 'MeSH' terms, which can be used to identify key concepts
that the document is addressing.
14.4.3 Process documents
Once in a standardised form, the documents are converted into an XML
form that can be loaded into the SOLR index. In addition to this, the
documents are further processed to extract more information that may
be of interest. For example:
Institute disambiguation
The authors of the documents are normally associated with a particular
institute (e.g. university, hospital, company). However, the same institute
is often presented in many ways. For example, an author from the
Department of Cell Biology, Institute of Anatomy, University of Aarhus,
Denmark may publish under a number of different institute names,
including:
Department of Cell Biology, Institute of Anatomy, University of
Aarhus;
Department of Cell Biology, University of Aarhus;
Institute of Anatomy, University of Aarhus;
Aarhus University Hospital;
University of Aarhus;
Aarhus Universitet.
￿ ￿ ￿ ￿ ￿
Each of these institutes are linked to the 'University of Aarhus' - meaning
that a single search will fi nd all documents from this university, despite
the multiple original institute terms. Signifi cant investment was made
initially to bring together a comprehensive set of 'stem' terms for each
pharmaceutical company, top biotechnology companies and top 500
world universities. These were used to bring together a signifi cant set of
synonyms across multiple sources. All of this information was used to
identify consistent synonyms for each company or academic institute.
In addition, some institutes are hard to disambiguate due to multiple
institutes with the same or similar information. For instance, 'Birmingham
University' could either be 'Birmingham University, England' or
 
Search WWH ::




Custom Search