A Semi-Automated Approach to the Content Analysis of Experience Narratives - Modeling Users' Experiences with Interactive Systems

Information Technology Reference

In-Depth Information

treats each document as a pool of terms. However, it applies two pre-processing

procedures in order to enhance the quality of the indexing procedure.

Firstly, a number of words, called stop-words , such as prepositions, pronouns and

conjunctions, are commonly found in documents and carry no semantic information

for the comprehension of the document theme (Fox, 1989). Such words are excluded

from further analysis as they do not provide meaningful information and are likely

to distort the similarity measure. We used a list stop-words provided by Fox (1989).

Secondly, the remaining terms are reduced to their root words through stemming

algorithms. For instance, the terms “usability” and “usable” are reduced to the term

“usabl”, thus allowing the indexing of multiple forms of a word under one dimension

in the vector-space model. We employed Porter's (1980) algorithm for stemming.

6.2.1.2

Normalizing Impact of Terms

The first step in the procedure has resulted in a nxm matrix A where each element

a i,j depicts the number of times that the stemmed term i appears in document j .The

frequencies of different terms across different documents will vary substantially.

This results in undesired impacts of terms that are more frequent across a larger

set of documents as they receive higher weight than terms that appear in only a

small set of documents. However, these terms that appear in many documents have

limited discriminatory power and are thus not very informative. One term-weighting

criterion that counterbalances for this inherent bias is the term-frequency inverse-

document frequency (TFIDF) (Salton and Buckley, 1988):

a i , j weighted =

a i , j ∗

log

(

nDocs

/

nDocs i )

(6.1)

which weights the frequency a i,j by the logarithm of the ratio of the total number of

documents nDocs by the number of documents nDocs i in which the term i appears.

Thus, frequent terms that appear in a large amount of documents and thus have little

discriminatory power receive lower weight in the final matrix.

6.2.1.3

Dimensionality Reduction

Matrix A is sparse and high-dimensional. Moreover, certain groups of terms may

display similar distributions across the different documents, thus underlying a single

latent variable. LSA attempts to approximate A by a matrix of lower rank. Singular

Value Decomposition is used to decompose matrix A in three matrices U, S, V in

that

USV T

A

=

(6.2)

Matrices U and V are orthonormal matrices and S is a diagonal matrix that contains

the singular values of A. Singular values are ordered in decreasing size in matrix

S, thus by taking the first kxk submatrix of S, we approximate A by its best-fit of

rank k.

U nk S kk V mk

A k =

(6.3)

Modeling Users' Experiences with Interactive Systems

Search WWH ::

Custom Search

Home