Information Technology Reference
In-Depth Information
6.2.1.4
Computing Document Similarity
The similarity between different documents or different terms may then be com-
puted on the reduced dimensionality approximation of A. Matrices 6.4 and 6.5 con-
stitute mxm and nxn covariances matrices for the documents and terms, respec-
tively. The proximity matrices for the documents and terms are then derived by
transforming 6.4 and 6.5 to correlation matrices.
A k A k =
V mxk S kxk V mxk
=
S R
(6.4)
A k A k =
U nxk S kxk V nxk
(6.5)
Each element s i , j represents the similarity between documents, or terms i and j .The
proximity matrix is normalized to a range (0,1) and transformed to a distance matrix
with each element d i , j =
1
−|
s i , j |
.
6.2.2
Limitations of Latent-Semantic Analysis in the Context of
Qualitative Content Analysis
Latent-Semantic Analysis has been shown to adequately approximate human judg-
ments of semantic similarity in a number of contexts (Landauer et al., 2003; Kat-
sanos et al., 2008; Larsen et al., 2008a). However, one may expect a number of
drawbacks when compared to traditional content analysis procedures as applied by
researchers.
First, LSA assumes a homogeneity in the style of writing across documents.
Thus, the extend to which different words occur in one document over a second
one denotes a difference in content across the two documents. This assumptions has
been shown to hold in contexts of formal writing such as web pages (Katsanos et al.,
2008) or abstracts of academic papers (Larsen et al., 2008a), but it is not expected to
hold in qualitative research data such as interview transcripts or self-provided essays
in diary studies as the vocabulary and verbosity of documents might substantially
vary across different participants.
Second, LSA computes the similarity between documents based on the co-
occurrence of all possible terms that may appear in the pool of documents. In the
analysis of qualitative data, however, one is interested only in a small set of words
that refer to a phenomenon that the researchers are interested in. As a result, words
that are of minimal interest to the researchers may shadow the semantic relations
that researchers are pursuing at identifying.
Third, LSA lacks an essential part of qualitative research, that of interpretation.
As different participants may use different terms or even phrases to refer to the same
latent concept, an objectivist approach that relies purely on semantics will evidently
fail in capturing the relevant concepts. Ideally, automated vector-space models could
be applied to meta-data that have resulted from open coding qualitative procedures
(Strauss and Corbin, 1998). In the next section we propose such a semi-automated
approach to semantic classification.
Search WWH ::




Custom Search