Information Technology Reference
In-Depth Information
Table 2.
Regular expression for finding antigen and allele
Class Regular Expression
Antigen /\b (( |HLA− )( A|B|Bw|Cw|DPw|DQ|DR|DRw|Dw ))( \d +) \b/
Allele
/\b (( |HLA− )([ ABCEFGHJKLNPSTUV WXY Z ] |DRA|DRB\d|
DQA [12] |DQB [123] |DO [ AB ] |DM [ AB ] |DPA [123] |DPB [12]) |TAP [12] |
PSMB [89] |MIC [ ABCDE ]) \∗ ( \d +)([ LNSCAQ ]?) \b/
2.2 Disease and Geographic Locations Entity
HLA entities activate disease on some specific human type. Normally human
type depend on geographic location. Therefore, Recognition geographic location
entity is key factor on analysis HLA-disease interaction information.
We recognize the disease and geographic location entities in abstracts by us-
ing diseases and geographic location category of MeSH. MeSH is the ontology
that provides disease and geographic location entities including 23,000 termi-
nologies. Abbreviation and synonyms provided by MeSH enable the system to
find variations in terminologies. Geographic location entity category is displayed
in Fig.1.
MeSh's geographic location information about cities and the countries all over
the world us to summarize HLA-disease interaction by location since the rela-
tionship between HLA and disease is different according to geographic location.
2.3 Abbreviation of Disease Entity
Variations in the abbreviation of disease become a challenge for automatic in-
formation extraction. Even the same abbreviation in publications could denote
different concepts by different authors. The recognition of abbreviations has an
influence on the performance of systems.
We recognize abbreviations in the literature by using abbreviation formation
patterns. Most abbreviations consist of capital letters and are wrapped by paren-
thesis. They follow a predictable pattern, in which one letter in abbreviations
Fig. 1. Geographic Location Entity Category
Search WWH ::




Custom Search