A Semi-Automated Approach to the Content Analysis of Experience Narratives - Modeling Users' Experiences with Interactive Systems

Information Technology Reference

In-Depth Information

automated as concepts identified in a small set of narratives may be used to assess

the similarity among the full set of narratives. This results in an iterative process of

coding and visualization of obtained insights.

Next, we describe the application of a fully-automated procedure that relies on

a vector-space model (Salton et al., 1975), the Latent Semantic Analysis (LSA)

and motivate the proposed adaptations of this approach towards a semi-automated

approach.

6.2

Automated Approaches to Semantic Classification

A number of automated approaches exist for the assessment of semantic similarity

between documents (for an extensive review see Kaur and Hornof, 2005; Cohen and

Widdows, 2009). These approaches rely on the principle that the semantic similarity

between two documents relates to the degree of term co-occurrence in these docu-

ments (Deerwester et al., 1990). In this sense, every document may be characterized

as an n-dimensional vector where each element of the vector depicts the number of

times that a given term appears in the document. The similarity between documents

may then be computed in a high-dimensional geometrical space defined by these

vectors.

Latent-Semantic Analysis (LSA) (Deerwester et al., 1990), also known as Latent-

Semantic Indexing within the field of Information Retrieval, is one of the most pop-

ular vector-space approaches to semantic similarity measurement. It has been shown

to reflect human semantic similarity judgments quite accurately (Landauer and Du-

mais, 1997) and has been successfully applied in a number of contexts such as that

of identifying navigation problems in web sites (Katsanos et al., 2008) and structur-

ing and identifying trends in academic communities (Larsen et al., 2008a).

LSA starts by indexing all n terms that appear in a pool of m documents, and

constructs a nxm matrix A where each element a i,j depicts the number of times that

the term i appears in document j . As matrix A is high-dimensional and sparse, LSA

employs Singular-Value Decomposition (SVD) in reducing the dimensionality of

the matrix and thus identifying the principal latent dimensions in the data. Semantic

similarity can then be computed on this reduced dimensionality space which depicts

a latent semantic space. Below, we describe in detail the procedure as applied in this

chapter.

6.2.1

The Latent-Semantic Analysis Procedure

6.2.1.1

Term Indexing

Term-indexing techniques may vary from simple “bag-of-words” approaches that

discard the syntactic structure of the document and only index the full list of words

that appear in a document, to natural language algorithms that identify the part-of-

speech, e.g. the probability that a term is a noun or a verb, in inferring the essence

of a word (Berry et al., 1999). LSA typically discards syntactic information and

Search WWH ::

Custom Search

Home