Databases Reference
In-Depth Information
Figure 3.6 shows the graphical model for Labeled LDA, where represents the set of all
possible topics. Since we assume the existence of labels for each document, is observed for each
document (grayed in the Figure).
T d
D
z
w
) j
K
N d
T
D
/
E
Figure 3.6: Graphical model for labeled LDA. Additions to the standard LDA model are highlighted
in black.
To apply Labeled LDA to Twitter, Ramage et al. first conducted a set of structured interviews
to identify what are the basic dimensions people consider when they decide what posts to read
or what user to follow on Twitter. They found four such dimensions (called 4S): substance topics
(about an entity or idea), social topics where language is used towards a social end (e.g., making plans
with friends), status topics conveying personal updates, and style topics (e.g., humor or wit). Then,
through a rather complex semi-automated process they label a large Twitter dataset with those four
labels and run Labeled LDA on it.
The output of this process is a topic model for Twitter conversations that is only based on four
topics, namely, the 4S. This model can be applied to any set of tweets. Figure 3.7 ,from Ramage et al.
[ 2010 ], shows how the tweets of two sample users can be a visualized in the context of a 4S topic
model 4 .
Ramage et al. also ran a user study which indicates that the learned topic models would be
effective in helping Twitter users to identify the most valuable posts in their current feed, as well as
what new users to follow.
Recently, there has also been work on applying topic modeling techniques to email con-
versations, with the limited goal of generating summary keywords for each email message. In an
empirical comparison of different unsupervised approaches, LDA has been shown to be the best
performer [ Dredze et al. , 2008 ]. In essence, once a set of email messages has been modeled with
LDA, the best keywords to describe an email are the ones with the highest probability given that
email. The probability of each candidate keyword c , given an email e , can be formally computed as:
P(c | e) = j = 1 P(c | z i )P (z i | e)
4 Additional interactive examples can be explored at http://twahpic.cloudapp.net/
Search WWH ::




Custom Search