Information Technology Reference
In-Depth Information
could then witness the authoring styles of hypertexts and understand their various
historical contexts.
From a social viewpoint, Wexelblat and Maes [16] demonstrated the Footprints
system that adds social context to browsed document structures by utilizing historical
data on user visits. In result, new users could be guided to useful and popular
resources.
Ohshima et al. [12] proposed an approach for showing the changes in rivals or
peers of user-defined objects over time based on data obtained from querying online
news archives. In general, mining text streams has been studied relatively well (for
example, see [15,10,1]).
Overall, until now there was relatively little research that explicitly aimed at mining
content stored in web archives despite the fact that it presents a great potential for
knowledge discovery. Apart from a few exceptions, most approaches neglected
temporal dimension of page content. Aschenbrenner and Rauber [3] surveyed work
that had been done towards mining large portions of web content with consideration
of its temporal aspect. The authors provided also a general outlook on the potential of
mining archived data. Rauber et al. [13] discussed the possibility of mining past web
data for identifying and portraying changes in web-related technologies, particularly
in such characteristics of pages as file format, language, size, etc. Arms et al. [2] have
reported on building a research library for scientists to study the evolution of content
and the structure of the web.
5 Conclusions
In this position paper we have discussed several issues related to the process of
knowledge acquisition from document archives containing historical documents. We
have compared the documents in archives to primary sources common in historical
studies and described their characteristics from the viewpoint of automatic knowledge
acquisition. We believe that historical document archives could be more useful for
society and should have more value after wide range of applications had been
developed for effective mining of historical knowledge.
Acknowledgement. This work has been partially supported by National Institute of
Information and Communications Technology, Japan and by MSR IJARC CORE6
project entitled “Mining and Searching Web for Future-related Information”
References
1. Allan, J. (ed.): Topic detection and tracking: event-based information organization. Kluwer
Academic Publishers, Norwell (2002)
2. Arms, W.Y., Aya, S., Dmitriev, P., Kot, B.J., Mitchell, R., Walle, L.: Building a research
library for the history of the web. In: Proceedings of the Joint Conference on Digital
Libraries, pp. 95-102 (2006)
3. Aschenbrenner, A., Rauber, A.: Mining web collections. In: Masanes, J. (ed.) Web
Archiving, Springer, Heidelberg (2006)
Search WWH ::




Custom Search