Information Technology Reference
In-Depth Information
6.4.4
Traditional Latent-Semantic Analysis
The third approach was a traditional Latent-Semantic Analysis as described in this
chapter. It involved two pre-processing steps: a) extracting a list of stop-words ,
and b) stemming terms to their root form. This resulted in a total of 1873 unique
terms that were used to characterize the 329 narratives. The resulting 1873x329
matrix was submitted to a Singular-Value Decomposition and the dominating 26
latent dimensions were extracted.
6.4.5
Cluster Analysis on Dissimilarity Matrices
All three procedures resulted in a 329x329 matrix depicting the dissimilarity be-
tween the narratives. The three dissimilarity matrices were then submitted to hierar-
chical cluster analysis using a minimum variance criterium and the first nine clusters
were extracted.
The performance of the three approaches is compared by contrasting the output
of each method to the output of the hand-coded classification in the original study
(chapter 4). The original hand-coded classification resulted in the identification of
five overall categories: stimulation, learnability, long-term usability, usefulness and
social experiences . Traditional content analysis, as applied in the original study, is
considered as an optimal classification and used as reference for the three automated
procedures.
To enable the comparison between the output of the three approaches with the
output of the content analysis of the initial study, a mapping needs to be created
between the 9 clusters generated by each of the three approaches and the five cate-
gories of the traditional content analysis. Once all 9 clusters are assigned to one of
the five overall categories, interrater agreement indices such as the Kappa statistic
(Fleiss et al., 2003), or the overall percent of correctly classified narratives may be
computed in assessing the agreement between the three automated approaches and
the traditional content analysis.
We employ two approaches for assigning each of the nine clusters to one of the
five identified categories. First, this may be performed based on the distribution of
narratives within a cluster over the five categories. The distribution for all nine clus-
ters may be visualized in a 9x5 matrix where each cell of the matrix m i , j depicts the
number of narratives that are classified to the cluster i (out of the 9 overall clusters
that resulted from the automated analysis procedure) and to the j category (out of
the 5 categories that resulted from the manual coding procedure in the initial study).
According to this criterium, each cluster is assigned to that category that contains
the highest number of narratives. This approach minimizes the error induced by the
mapping process, and results in the best possible value for the agreement between
the automated methods and traditional content analysis.
However, this best possible value may not be obtained in real settings where
human interpretation is required to further classify the narratives. Thus, a second
approach involves human raters. Each cluster, as proposed earlier in this chapter,
can be characterized by the three most dominant terms in the experience narratives
Search WWH ::




Custom Search