A Semi-Automated Approach to the Content Analysis of Experience Narratives - Modeling Users' Experiences with Interactive Systems

Information Technology Reference

In-Depth Information

6.2.1.4

Computing Document Similarity

The similarity between different documents or different terms may then be com-

puted on the reduced dimensionality approximation of A. Matrices 6.4 and 6.5 con-

stitute mxm and nxn covariances matrices for the documents and terms, respec-

tively. The proximity matrices for the documents and terms are then derived by

transforming 6.4 and 6.5 to correlation matrices.

A k A k =

V mxk S kxk V mxk

=

S R

(6.4)

A k A k =

U nxk S kxk V nxk

(6.5)

Each element s i , j represents the similarity between documents, or terms i and j .The

proximity matrix is normalized to a range (0,1) and transformed to a distance matrix

with each element d i , j =

1

−|

s i , j |

.

6.2.2

Limitations of Latent-Semantic Analysis in the Context of

Qualitative Content Analysis

Latent-Semantic Analysis has been shown to adequately approximate human judg-

ments of semantic similarity in a number of contexts (Landauer et al., 2003; Kat-

sanos et al., 2008; Larsen et al., 2008a). However, one may expect a number of

drawbacks when compared to traditional content analysis procedures as applied by

researchers.

First, LSA assumes a homogeneity in the style of writing across documents.

Thus, the extend to which different words occur in one document over a second

one denotes a difference in content across the two documents. This assumptions has

been shown to hold in contexts of formal writing such as web pages (Katsanos et al.,

2008) or abstracts of academic papers (Larsen et al., 2008a), but it is not expected to

hold in qualitative research data such as interview transcripts or self-provided essays

in diary studies as the vocabulary and verbosity of documents might substantially

vary across different participants.

Second, LSA computes the similarity between documents based on the co-

occurrence of all possible terms that may appear in the pool of documents. In the

analysis of qualitative data, however, one is interested only in a small set of words

that refer to a phenomenon that the researchers are interested in. As a result, words

that are of minimal interest to the researchers may shadow the semantic relations

that researchers are pursuing at identifying.

Third, LSA lacks an essential part of qualitative research, that of interpretation.

As different participants may use different terms or even phrases to refer to the same

latent concept, an objectivist approach that relies purely on semantics will evidently

fail in capturing the relevant concepts. Ideally, automated vector-space models could

be applied to meta-data that have resulted from open coding qualitative procedures

(Strauss and Corbin, 1998). In the next section we propose such a semi-automated

approach to semantic classification.

Modeling Users' Experiences with Interactive Systems

Search WWH ::

Custom Search

Home