Information Technology Reference
In-Depth Information
evaluated classification systems that have been enhanced and improved by our typographical weight-
ing approaches.
The main aspects of our approaches are the usage of the typographical information contained in
text documents and a higher weighting of certain emphasized text passages, which differ from plainly
typed text. As the evaluation shows, the usage of typographic information significantly improves the
classification of text documents. We consider this approach to be a useful extension to all information
retrieval processes. Using the typographic information contained in text documents also improves other
information retrieval approaches such as text clustering or the focused crawling process which was
described in (Gräfe & Werner, 2004).
r eferences
Apté, C., Damerau, F., & Weiss, S. (1994). Towards language independent automated learning of text
categorization models. In Proceedings of the 17 th ACM SIGIR Conference on Research and Development
in Information Retrieval (SIGIR '94) (pp. 23-30).
Chakrabarti, S., Dom, B. E., & Indyk, P. (1998). Enhanced hypertext categorization using hyperlinks.
In L. M. Haas & A. Tiwary, (Eds.), Proceedings of SIGMOD-98, ACM International Conference on
Management of Data (pp.307-318). New York: ACM Press,
Cutler, M., Shih, Y., & Meng, W. (1997). Using the structure of HTML documents to improve retrieval.
In USENIX Symposium on Internet Technologies and Systems .
Davison, B. (2000). Topical locality in the Web. In Research and Development in Information Retrieval
(SIGIR) (pp. 272-279).
Gräfe, G. & Werner, L. (2004). Context-based information retrieval for improved information quality
in decision-making processes. In K. Tochtermann & H. Maurer, (Eds.), Proceedings of the 4 th Interna-
tional Conference on Knowledge Management (pp. 379-387). Graz, Austria.
Hartley, J. (1986). Planning the typographical structure of instructional text. Educational Psychologist ,
21 (4),315-332.
Kim, S. & Zhang, B.-T. (2000). Web-document retrieval by genetic learning of importance factors for
HTML tags. In PRICAI Workshop on Text and Web Mining (pp. 13-23).
Kwon, O.-W. & Lee, J.-H. (2000). Web page classification based on k-nearest neighbor approach. In IRAL
'00: Proceedings of the fifth international workshop on Information retrieval with Asian languages
(pp. 9-15). New York: ACM Press.
Lewis, D. (1992). An evaluation of phrasal and clustered representations on a text categorization task.
In SIGIR '92: Proceedings of the 15th annual international ACM SIGIR conference on Research and
development in information retrieval (pp. 37-50). ACM Press.
Luhn, H. (1958). The automatic creation of literature abstracts. IBM Journal of Research and Develop-
ment , 2 ,159-165.
Search WWH ::

Custom Search