Language Independent Extraction of Key Terms: An Extensive Comparison of Metrics - Agents and Artificial Intelligence

Information Technology Reference

In-Depth Information

In future work, instead of using fixed length character prefixes of words we will

pre-process our documents collection to automatically extract real word radicals using

some of the existing language independent morphology learners like Linguistica [23]

and Morfessor [24].

For Asian languages as Chinese or Japanese, we will apply the extractor [18,17] to

characters instead of words and extract multi-character, 2-grams and 3-grams, and use

single and multi-character strings ranked using the metrics proposed.

For German, the use of language independent morphology learners mentioned

above, together with words and multi-words extracted the same way as we did for

Portuguese, Czech or English will enable us to extend our methodology to a larger set

of languages.

Acknowledgements. This was supported by the Portuguese Foundation for Science

and Technology (FCT/MCTES) through funded research projects ISTRION (ref.

PTDC/EIA-EIA/114521/2009).

References

1. da Silva, J.F., Lopes, G.P.: A Document Descriptor Extractor Based on Relevant

Expressions. In: Lopes, L.S., Lau, N., Mariano, P., Rocha, L.M. (eds.) EPIA 2009. LNCS

(LNAI), vol. 5816, pp. 646-657. Springer, Heidelberg (2009)

2. da Silva, J.F., Lopes, G.P.: Towards Automatic Building of Document Keywords. In:

COLING 2010 - The 23rd International Conference on Computational Linguistics, Poster

Volume, Pequim, pp. 1149-1157 (2010)

3. Teixeira, L., Lopes, G., Ribeiro, R.A.: Automatic Extraction of Document Topics. In:

Camarinha-Matos, L.M. (ed.) DoCEIS 2011. IFIP AICT, vol. 349, pp. 101-108. Springer,

Heidelberg (2011)

4. Sebastiani, F.: Machine Learning in Automated Text Categorization. ACM Computing

Surveys 34(1), 1-47 (2002)

5. da Silva, J.F., Lopes, G.P.: A Local Maxima Method and a Fair Dispersion Normalization

for Extracting Multiword Units. In: Proceedings of the 6th Meeting on the Mathematics of

Language, Orlando, pp. 369-381 (1999)

6. Jacquemin, C.: Spotting and discovering terms through natural language processing. MIT

Press (2001)

7. Hulth, A.: Improved Automatic Keyword Extraction Given More Linguistic Knowledge.

In: EMNLP 2003 Proceedings of the Conference on Empirical Methods in Natural

Language Processing, pp. 216-223. Association for Computational Linguistics,

Stroudsburg (2003)

8. Ngonga Ngomo, A.-C.: Knowledge-Free Discovery of Domain-Specific Multiword Units.

In: Proceedings of the 2008 ACM Symposium on Applied Computing, SAC 2008, pp.

1561-1565. ACM, Fortaleza (2008),

doi: http://doi.acm.org/10.1145/1363686.1364053

9. Martínez-Fernández, J.L., García-Serrano, A., Martínez, P., Villena, J.: Automatic

Keyword Extraction for News Finder. In: Nürnberger, A., Detyniecki, M. (eds.) AMR

2003. LNCS, vol. 3094, pp. 99-119. Springer, Heidelberg (2004)

Search WWH ::

Custom Search

Home