Information Technology Reference
In-Depth Information
In future work, instead of using fixed length character prefixes of words we will
pre-process our documents collection to automatically extract real word radicals using
some of the existing language independent morphology learners like Linguistica [23]
and Morfessor [24].
For Asian languages as Chinese or Japanese, we will apply the extractor [18,17] to
characters instead of words and extract multi-character, 2-grams and 3-grams, and use
single and multi-character strings ranked using the metrics proposed.
For German, the use of language independent morphology learners mentioned
above, together with words and multi-words extracted the same way as we did for
Portuguese, Czech or English will enable us to extend our methodology to a larger set
of languages.
Acknowledgements. This was supported by the Portuguese Foundation for Science
and Technology (FCT/MCTES) through funded research projects ISTRION (ref.
PTDC/EIA-EIA/114521/2009).
References
1. da Silva, J.F., Lopes, G.P.: A Document Descriptor Extractor Based on Relevant
Expressions. In: Lopes, L.S., Lau, N., Mariano, P., Rocha, L.M. (eds.) EPIA 2009. LNCS
(LNAI), vol. 5816, pp. 646-657. Springer, Heidelberg (2009)
2. da Silva, J.F., Lopes, G.P.: Towards Automatic Building of Document Keywords. In:
COLING 2010 - The 23rd International Conference on Computational Linguistics, Poster
Volume, Pequim, pp. 1149-1157 (2010)
3. Teixeira, L., Lopes, G., Ribeiro, R.A.: Automatic Extraction of Document Topics. In:
Camarinha-Matos, L.M. (ed.) DoCEIS 2011. IFIP AICT, vol. 349, pp. 101-108. Springer,
Heidelberg (2011)
4. Sebastiani, F.: Machine Learning in Automated Text Categorization. ACM Computing
Surveys 34(1), 1-47 (2002)
5. da Silva, J.F., Lopes, G.P.: A Local Maxima Method and a Fair Dispersion Normalization
for Extracting Multiword Units. In: Proceedings of the 6th Meeting on the Mathematics of
Language, Orlando, pp. 369-381 (1999)
6. Jacquemin, C.: Spotting and discovering terms through natural language processing. MIT
Press (2001)
7. Hulth, A.: Improved Automatic Keyword Extraction Given More Linguistic Knowledge.
In: EMNLP 2003 Proceedings of the Conference on Empirical Methods in Natural
Language Processing, pp. 216-223. Association for Computational Linguistics,
Stroudsburg (2003)
8. Ngonga Ngomo, A.-C.: Knowledge-Free Discovery of Domain-Specific Multiword Units.
In: Proceedings of the 2008 ACM Symposium on Applied Computing, SAC 2008, pp.
1561-1565. ACM, Fortaleza (2008),
doi: http://doi.acm.org/10.1145/1363686.1364053
9. Martínez-Fernández, J.L., García-Serrano, A., Martínez, P., Villena, J.: Automatic
Keyword Extraction for News Finder. In: Nürnberger, A., Detyniecki, M. (eds.) AMR
2003. LNCS, vol. 3094, pp. 99-119. Springer, Heidelberg (2004)
Search WWH ::




Custom Search