Automatic Extraction of HLA-Disease Interaction Information from Biomedical Literature - Advances in Computational Science and Engineering

Information Technology Reference

In-Depth Information

corresponds to the first letter of each word in its entirety. Schwartz[10] proposes

an algorithm using these patterns to recognize abbreviations from biomedical

publications. We improve upon the algorithm of Ariel and found abbreviations

of diseases in abstracts.

3

Information Extraction in HaDextract System

Overall system architecture is displayed in Fig.2. 'Import Abstract Component'

downloaded abstract XML file from PubMed. 'Tokenizing Component' split ab-

stract text into words. 'POS Tagging Component' found POS of each word using

fnTBL POS Tagger offered by Ratnaparkhi[11]. FnTBL was trained with GE-

NIA Corpus 3.0 to search suitable POS in the biomedical domain. 'Entity Rec-

ognizing Component' searched entity names using regular expression and MeSH

keywords, and 'Syntactic Parsing Component' created parse tree using Collins

Parser[12]. 'Semantic Interpret Component' extracted HLA-disease interaction

information using extracted entities, parse trees, relation keywords, and filtering

keywords.

3.1 Relation and Filtering Keyword

In an attempt to find HLA-disease relation information, we developed relation

and filtering keywords determined by domain experts from 309 HLA publications.

Fig. 2. HaDextract System Architecture

Search WWH ::

Custom Search

Home