Information Technology Reference
In-Depth Information
Table 1. Concept of event characterization according to its view in primary and secondary
sources
Event exists in primary
data
Event does not exist in
primary data
Event exists in secondary
data
Established, well-
remembered event
Event discovered later
Event does not exist in
secondary data
Forgotten event
-
Table 2. Concept of event characterization according to its sentiment view in primary and
secondary sources
Event was recognized
positively in the past
Event was recognized
negatively in the past
Event is recognized
positively now
Constant positive
recognition of the event
Change: event become
positively recognized
Change: event become
negatively recognized
Constant negative
recognition of the event
Event is recognized
negatively now
3.2 Data Normalization
For the case of primary sources regarding the distant past (e.g., news articles from the
19 th Century) there is a problem of the change in wider context such as language,
culture or society rules. Generally, the further we move into the past, the harder is to
understand the historical sources due to the gap brought by sociological and
technological changes. For example, certain words may be no longer used or their
meaning can differ from the one used currently. Hence, there is a need for a kind of
“data normalization” or its “translation” so that the information extracted from distant
primary sources could be understood and directly used for knowledge acquisition in
combination with information obtained from documents created in more recent time
periods. Although, the web has still relatively short history as compared to the history
of print, nevertheless, is has existed in the times of rapid technological and cultural
change. Therefore the same problems can be found in web archives here, yet,
naturally, to lesser extent.
Somewhat similar data normalization is also often needed when we compare the
results of statistics taken from different time points. For example, it is commonly
known that more news articles appeared recently than in the distant past due to the
rapid increase in the rate of journalistic activity. Therefore, the counts of documents
created in unit time periods of near and distant past should be comparable only after
their normalization with respect to the whole collection size in the both periods. A
simple way of approximating the rate of article growth over time could be done
measuring hitcount values obtained for series of stop words within the different time
periods [7], provided such data is available.
3.3 Information Credibility
Both for the primary and secondary sources the credibility is of paramount
importance. As historians have to effectively deal with forged or corrupted
Search WWH ::




Custom Search