Information Technology Reference
In-Depth Information
can be set according to the query. This querying approach is made in order to reflect
the actual distribution of relevant documents over time within the required time frame
T . By dividing the query into many sub-queries we decrease the effect of temporal
aspect used in the ranking algorithm of the particular collection, and thus, we manage
to rely more on the actual document relevance.
In a crude distinction, the knowledge obtained from historical archives can be
divided into two broad classes:
knowledge about a particular source or group of sources and their changes
and evolution
knowledge about the past outlook of the world and the society as well as
about the evolution of a particular topic or information over time
Below, we describe the both classes in more detail.
2.2.1 Knowledge on Sources
The first kind of knowledge relates to a given source or information container such as
a web page or newspaper. Among the basic information are the frequency and
changes in appearance of certain words or topics, the age of document components,
the document change frequency or change degree.
For news archive this means characterizing a particular newspaper, magazine etc.
through analyzing past editions and contributions. Such information may be useful for
measuring the characteristics of news sources, identifying the relevant and high
quality ones and so on. In case of web archives this kind of knowledge could add
missing information for users browsing a current page version. For example, the users
could learn about common topics that were discussed on the page recently or long
time in the past. It would be then possible for them to contrast such topics with the
ones published on the present page version. This could provide a context for better
understanding of the current page version as well as the consistency, periodicity and
other temporal characteristics of the page [8,9].
As another kind of knowledge users could receive the information on the age of
certain components on pages in order to support the evaluation of their freshness and
validity. This information would be obtained by comparing past page versions with
the current one. For example, a page component annotated with “new” description
may be discovered to be actually quite old as a result of the comparison of the current
page version with the old page versions [9].
2.2.2 Knowledge on World and Society
The second kind of knowledge can be helpful for understanding the past as well as for
learning about the present - e.g. trends, events, their origins and causes. There are
myriads of potential kinds of such knowledge and the ways in which it could be
utilized. In the simplest form, it can be extracted using summarization, filtering,
association and other text mining technologies on the time series of features.
NY Times API 10 is an example of a programmable interface that offers an effective
tool for news collection mining for such kind of knowledge. For a given time period
one can find the names of objects mentioned in the news articles such as place or
10 http://developer.nytimes.com/
Search WWH ::




Custom Search