Information Technology Reference
corresponds to the first letter of each word in its entirety. Schwartz proposes
an algorithm using these patterns to recognize abbreviations from biomedical
publications. We improve upon the algorithm of Ariel and found abbreviations
of diseases in abstracts.
Information Extraction in HaDextract System
Overall system architecture is displayed in Fig.2. 'Import Abstract Component'
downloaded abstract XML file from PubMed. 'Tokenizing Component' split ab-
stract text into words. 'POS Tagging Component' found POS of each word using
fnTBL POS Tagger offered by Ratnaparkhi. FnTBL was trained with GE-
NIA Corpus 3.0 to search suitable POS in the biomedical domain. 'Entity Rec-
ognizing Component' searched entity names using regular expression and MeSH
keywords, and 'Syntactic Parsing Component' created parse tree using Collins
Parser. 'Semantic Interpret Component' extracted HLA-disease interaction
information using extracted entities, parse trees, relation keywords, and filtering
3.1 Relation and Filtering Keyword
In an attempt to find HLA-disease relation information, we developed relation
and filtering keywords determined by domain experts from 309 HLA publications.
Fig. 2. HaDextract System Architecture