Information Technology Reference
In-Depth Information
HLA perform an important role in human immunity and has special allelic
pairs in each person. The knowledge of the alleles of HLA's main 6 genes (HLA-
A, -B -C, DRB1, DQB1, DPB1) is continually developing and it was reported in
2004 that 1,729 alleles were found. However this number has been exponentially
increasing every year. A person's allelic makeup can influence their response to
disease. Even though a person might be infected with the same microorganism,
their responses may vary from self-healing to serious disease. Because HLA al-
lele frequency differs according to geographic location, considerable number of
studies has carried out into the relationship between HLA allele frequency and
disease but still little is known. Relation between HLA and IE is found though
textmining technique such as Named Entity Recognition(NER) and Information
Extraction(IE).
There have been various attempts to eciently find entities within biomedical
literatures. Hanisch[1] found protein names that appear in biomedical text using
search terms of protein names. Hatzivassiloglou[2] and Kazama[3] used machine
learning approaches with word formation pattern, POS information, semantic
information, prefix, sux, and et al. The performance of these methods is about
60-80%.
There have also been numerous attempts to find interactions between entities
used in literature. Friedman[4] and Temkin[5] extracted protein-protein
interactions in biomedical abstracts using keywords and grammars built by domain
experts. Leroy[6] used Finite State Automata(FSA) with closed words, and demon-
strated that FSA can extract information in literature. McDonald[7] generated a
potential parse tree using their parser and filtered out parse trees with little infor-
mation. Filtering algorithm are used to select informative parse trees with valid
interaction information among potential parse trees. This method has the advan-
tage that grammar is not necessary to extract information. Horn[8] extracted in-
teraction information between protein and point mutations rather than extracting
information between proteins. Novichkova[9]introduce a general biomedical
domain-oriented system that can extract various biomedical information.
In this paper, to deal with the HLA names variants, we build the regular ex-
pression of HLA and used MeSH ontology. In this study, we intended to extract
interaction information between HLA and disease using textmining methods. we
make use of the structural information of the sentences with aim of finding in-
teractions between HLA and disease. The structural information of a sentence is
derived through applying parse tree to the dependency relationship of the key-
words in the sentence. The systems of McDonald[7] uses the potential parse tree
using their parser while our system uses the parse tree through the dependency
relationships between the keywords. This method analyzes more effectively in-
volved sentence and extracts more accuracy relation information between entities
which consists of a coordinating conjunction, 'and' and 'or', etc.
Our system is divided to 5 sub-processes: Tokenizing, Pos tagging, Entity Rec-
ognizing, Syntactic Analysis and Semantic Analysis. While HaDextract system in-
corporated all 5 sub processes including hidden relation, other data mining systems
 
Search WWH ::




Custom Search