Information Technology Reference
In-Depth Information
be preserved for future generations. Therefore it has become possible to collect large
amount of real-world data and convert it into document archives with unified access
to all the individual artifacts. Such collections can be then easily accessed and
analyzed using state-of-the-art computer technologies.
However the popularity of archives, especially of web archives, and their user
awareness are still relatively low despite the availability of traditional access methods
such as browsing or searching. This situation may raise questions of the necessity of
archiving as well as may hinder the archival process. In this paper we argue that
automatic knowledge acquisition from the archives is what could boost the usefulness
of the archives and what could increase their value to the society. We discuss various
issues related to the discovery of historical knowledge and possible applications. Our
view is partially inspired by the historical methods and the notion of historiography 1 -
the methodology of the discipline of history. Automatic temporal knowledge
acquisition from historical document archives by using text mining applications can
be useful for computational journalism [17], education, entertainment, verifying
accuracy of existing historical descriptions and so on.
The remainder of this paper is structured as follows. The next section contains the
description of document archive usage with emphasis on the various types of
knowledge acquisition and their related issues. Section 3 provides deeper discussion
of selected aspects of archived documents and the process of historical studies using
document archives. The next section contains the related work. We conclude the
paper in the last section.
2 Archive Usage
Despite their great potential historical document archives are still only moderately
popular within narrow group of users, and few people seem to be aware of their
existence and availability. Based on the online questionnaire study made in 2008 on
1000 users in Japan [8] we have found that less than 2% of web users have recently
used any web archive. Partly this may be because of the lack of large online web
archives in Japan. Nevertheless, this result implies rather low awareness of users
about web archives and of the potential models of their usage. On the other hand,
during the course of our studies we found that many people were often quite surprised
to learn about the existence of repositories preserving large portions of historical web
content. They also often expressed enthusiasm on hearing about the possibility of
using such content.
News archives such as Google News Archive Search 2 seem to be relatively known
and frequently used; though still, to rather limited extent when compared to the other
popular web services. We believe that new usage models should be introduced in
order to boost the usefulness and popularity of historical document archives.
2.1 Browsing and Searching
Browsing archive collection is one of the most fundamental ways of access. In the
case of a web archive this access may be similar to traditional web browsing with
1 http://en.wikipedia.org/wiki/historiography
2 http://news.google.com/archivesearch
Search WWH ::




Custom Search