Geography Reference
In-Depth Information
Fig. 2 ( Top ) Terms in
student responses about
how they use maps that are
linked by the word “and.”
( Bottom ) Terms in those
same responses that are
linked by “the”
Topic Modeling
Topic modeling is one popular computational approach for analyzing text data.
Specific techniques for topic modeling include basic probabilistic methods that
predict the likelihood that one word follows another (Wallach 2006 ), and somewhat
more sophisticated methods such as latent Dirichlet allocation (LDA) which can
model topics independent of word order (Blei et al. 2003 ). Many options exist today
for alternative approaches which advance upon these basic examples, with new
combinations and modifications appearing all the time. LDA, however, has
remained a popular method for topic modeling, and a large number of tools are
available today for researchers to apply which leverage the LDA approach. The
Machine learning for language toolkit (MALLET) is one such example that uses
LDA to mine topics from text (McCallum 2002 ). MALLET is built using the Java
programming language and provides command line controls for processing large
text collections to extract key topics using LDA. Since its first iteration in 2002,
MALLET has been improved in several stages, and the tools now include methods
for tagging sequences and classifying documents, among others.
To make MALLET easily usable by non-experts, David Newman at the Univer-
sity of California-Irvine created the Topic Modeling Tool (TMT) to provide a
graphical user interface to MALLET ( http://code.google.com/p/topic-modeling-
tool/ ). We used the TMT in our work to reveal key topics found in discussions in
Search WWH ::




Custom Search