Automatic Extraction of HLA-Disease Interaction Information from Biomedical Literature - Advances in Computational Science and Engineering

Information Technology Reference

In-Depth Information

first part, we eciently recognized entities by establishing regular expressions

and using Mesh ontology. In the second part, we extracted HLA-disease inter-

action information in sentence of complex structure by searching parse trees.

We extracted relation information using 909 abstracts in PubMed and offered

the information at our web site. Then, we tested the algorithm with 144 ran-

domly selected sentences. The precision rates reported 89.6% and reported 57.4%

in summarization of these sentences. Our algorithm may be extended to other

medicine fields such as mental disease and asthma where the relationship between

gene and disease is also of importance. We will continue to research an automatic

filtering method using machine learning technologies to filter sentences that have

no relation between entities without relation and filtering keywords.

References

1. Hanisch, D., Fluck, J., Mevissen, H.-T.: Playing Biologys names Game-Identifying

Protein Names in Scientific Text. In: Pacific Symposium on Biocomputing,

pp. 403-414 (2003)

2. Hatzivassiloglou, V., Duboue, P.A., Rzhetsky, A.: Disambiguating Proteins, Genes,

and RNA in Text - A Machine Learning Approach. Bioinfomatics 1(1), 1-10 (2001)

3. Kazama, J., Makino, T., Ohta, Y., Tsujii, J.: Tuning support vector machines for

biomedical named entity recognition. In: Proceedings of the workshop on Natural

Language Processing in the Biomedical Domain, July 2002, pp. 1-8 (2002)

4. Friedman, C., Kra, P., Yu, H., Krauthammer, M., Rzhetsky, A.: GENIES: a

natural-language processing system for the extraction of molecular pathways from

journal articles. Bioinfomatics 17(suppl. 1), S74-S82 (2001)

5. Temkin, J.M., Gilder, M.R.: Extraction of protein interaction information from

unstructured text using a context-free grammar. Bioinfomatics 19(16), 2046-2053

(2003)

6. Leroy, G., chen, H., Martinez, J.D.: A shallow parser based on closed-class words

to capture relations in biomedical text. Journal of Biomedical Informatics (2003)

7. McDonald, D.M., Chen, H., Su, H., Marshall, B.B.: Extracting gene pathway rela-

tions using a hybrid grammar: the Arizona Relation Parser. Bioinfomatics 20(18),

3370-3378 (2004)

8. Horn, F., Lau, A.L., Cohen, F.E.: Automated extraction of mutation data from

the literature: application of MuteXt to G protein-coupled receptors and nuclear

hormone receptors. Bioinfomatics 20(4), 557-568 (2004)

9. Novichkova, S., Egorov, S., Daraselia, N.: MedScan, a natural language processing

engine for MedLine abstract. Bioinfomatics 19(13), 1699-1706 (2003)

10. Schwartz, A.S., Hearst, M.A.: A simple Algorithm for Identifying Abbreviation

Definitions in Biomedical Text. In: Pacific Symposium on Biocomputing, vol. 8,

pp. 451-462 (2003)

11. Ratnaparkhi, A.: A Maximum Entropy Part-Of-Speech Tagger. In: Proceedings of

the Empirical Methods in Natural Language Processing Conference, May 17-18,

University of Pennsylvania (1996)

12. Collins, M.: Head-Driven Statistical Models for Natural Language Parsing. PhD

Dissertation, University of Pennsylvania (1999)

Advances in Computational Science and Engineering

Search WWH ::

Custom Search

Home