Information Technology Reference
In-Depth Information
Automatic Extraction of HLA-Disease
Interaction Information from Biomedical
Literature
JeongMin Chae 1 ,JiEunChae 2 ,TaeminLee 1 , YoungHee Jung 1 ,
HeungBum Oh 3 , and SoonYoung Jung 1
1 Department of Computer Science Education, Korea University, Korea
bluesky@comedu.korea.ac.kr
2 Department of Computer and Information Science, University of Pennsylvania,
USA
3 Department of Laboratory Medicine, Asan Medical Center and University of Ulsan
College of Medicine, Korea
Abstract. The HLA control a variety of function involved in immune
response and influence susceptibility to over 40 diseases. It is important
to find out how HLA cause the disease or modify susceptibility or course
of it. In this paper, we developed an automatic HLA-disease informa-
tion extraction procedure that uses biomedical publications. First, HLA
and diseases are recognized in the literature using built-in regular lan-
guages and disease categories of Mesh. Second, we generated parse trees
for each sentence in PubMed using collins parser. Third, we build our
own information extraction algorithm. The algorithm searched parsing
trees and extracted relation information from sentences. We automat-
ically collected 10,184 sentences from 66,785 PubMed abstracts using
HaDextract. The precision rate of extracted relations reported 89.6% in
randomly selected 144 sentences.
Keywords: HLA, textmining, disease, interaction information.
1
Introduction
The Human Leukocyte Antigen(HLA) system is the name of the Major Histo-
compatibility Complex(MHC) in humans. The HLA control a variety of function
involved in immune response and influence susceptibility to over 40 diseases. It
is important to find out how HLA cause the disease or modify susceptibility
or course of it. This is that will help in treatment of theses disease. Over the
years, a number of methods have been developed for the experimental meth-
ods or the prediction of HLA binding peptides from an antigenic sequence. The
experimental methods for recognition of theses HLA are both time-consuming
and cost-intensive. Computation method thus, provides a cost effective way to
identify these HLA. However, it is dicult to identify the HLA due to the most
complicated genetic structure in human body.
 
Search WWH ::




Custom Search