Supporting Text Retrieval by Typographical Term Weighting - Distributed Artificial Intelligence, Agent Technology, and Collaborative Applications

Information Technology Reference

In-Depth Information

evaluated classification systems that have been enhanced and improved by our typographical weight-

ing approaches.

The main aspects of our approaches are the usage of the typographical information contained in

text documents and a higher weighting of certain emphasized text passages, which differ from plainly

typed text. As the evaluation shows, the usage of typographic information significantly improves the

classification of text documents. We consider this approach to be a useful extension to all information

retrieval processes. Using the typographic information contained in text documents also improves other

information retrieval approaches such as text clustering or the focused crawling process which was

described in (Gräfe & Werner, 2004).

r eferences

Apté, C., Damerau, F., & Weiss, S. (1994). Towards language independent automated learning of text

categorization models. In Proceedings of the 17 th ACM SIGIR Conference on Research and Development

in Information Retrieval (SIGIR '94) (pp. 23-30).

Chakrabarti, S., Dom, B. E., & Indyk, P. (1998). Enhanced hypertext categorization using hyperlinks.

In L. M. Haas & A. Tiwary, (Eds.), Proceedings of SIGMOD-98, ACM International Conference on

Management of Data (pp.307-318). New York: ACM Press,

Cutler, M., Shih, Y., & Meng, W. (1997). Using the structure of HTML documents to improve retrieval.

In USENIX Symposium on Internet Technologies and Systems .

Davison, B. (2000). Topical locality in the Web. In Research and Development in Information Retrieval

(SIGIR) (pp. 272-279).

Gräfe, G. & Werner, L. (2004). Context-based information retrieval for improved information quality

in decision-making processes. In K. Tochtermann & H. Maurer, (Eds.), Proceedings of the 4 th Interna-

tional Conference on Knowledge Management (pp. 379-387). Graz, Austria.

Hartley, J. (1986). Planning the typographical structure of instructional text. Educational Psychologist ,

21 (4),315-332.

Kim, S. & Zhang, B.-T. (2000). Web-document retrieval by genetic learning of importance factors for

HTML tags. In PRICAI Workshop on Text and Web Mining (pp. 13-23).

Kwon, O.-W. & Lee, J.-H. (2000). Web page classification based on k-nearest neighbor approach. In IRAL

'00: Proceedings of the fifth international workshop on Information retrieval with Asian languages

(pp. 9-15). New York: ACM Press.

Lewis, D. (1992). An evaluation of phrasal and clustered representations on a text categorization task.

In SIGIR '92: Proceedings of the 15th annual international ACM SIGIR conference on Research and

development in information retrieval (pp. 37-50). ACM Press.

Luhn, H. (1958). The automatic creation of literature abstracts. IBM Journal of Research and Develop-

ment , 2 ,159-165.

Search WWH ::

Custom Search

Home