Information Technology Reference
In-Depth Information
6.3.3
Computing Narrative Similarity
The similarity between narratives is computed based on an explicitly defined latent
space of concepts that the researcher is interested in. Using the vector-space model
(Salton et al., 1975), a nxm matrix A is defined where each element a i,j depicts the
presence or absence (or number of times) of concept i in narrative j . Each narrative is
thus described by an n-dimensional vector and the similarity between the narratives
can be equated to the cosine of the mutual angle between the two vectors, as in
Latent-Semantic Analysis.
Fig. 6.4 Four possible states of a given concept appearing in none, one, or both narratives
One limitation, however, of this measure is that concepts that do not appear in
both narratives are taken into account in computing the similarity between the nar-
ratives. These concepts are likely to dominate as only a small set of the concepts is
likely to appear in 2 out of the 329 narratives. Figure 6.4 displays a two-dimensional
configuration of all possible states for a given concept: appearing in none of the
narratives ( N 00 ), appearing in both narratives (N 11 ), or appearing in one of the nar-
ratives ( N 01 and N 10 ). If all concepts are plotted for a given pair of narratives, the
N 00 would dominate due to their high frequency. These concepts, however, bear no
useful information about either of the narratives.
An alternative measure can be defined by discarding concepts that do not appear
in either narrative. More specifically, we propose to equate the similarity S i , j be-
tween two narratives to the ratio of the concepts that appear in both narratives N 11
over the sum of the concepts that appear in one of the narratives. The distance is
then derived in 6.7 as D i,j =1-S i,j :
N 11
N 01 +
S i , j =
(6.6)
N 10
N 01
+
N 10
N 11
D i , j =
(6.7)
N 01 +
N 10
Likewise, the similarity between two concepts may be computed from the ratio of
the number of narratives in which both concepts appear over the the sum of the
number of documents in which one of the concepts appears.
 
Search WWH ::




Custom Search