Databases Reference
In-Depth Information
Figure 3.7: Distributions of 4S topics for two users, when compared with the overall average distributions
in Twitter. The overall distribution is shown in the middle bar, with Status less frequent than the other
three topics, which are similarly distributed. The right bar shows the topic distribution for user @oprah,
which is similar to the overall distribution. The bar on the left show the distribution for user @w3c, which
is instead substantially different from the other two, as the topic Substances dominates the distribution.
Beside the users' topic distributions, common words for each topic are shown as word clouds. Size and
shade of the words convey frequency and recency of usage, respectively.
An intuitive way to interpret this formula is that the best keywords for an email e (i.e., the
ones with higher P(c | e) ), are the ones that are highly probable (high P(c | z i ) ) in the most likely
topics for that email message (high P(z i |
e) ).
Notice that this is a straightforward application of LDA. However, it provides an interesting
example of how topic modeling can be used to perform a very basic form of summarization, as for
each email, we can compute the summary keywords that best describe the email in the context of
a topic model. These short summary descriptions can be used as substitute for the original emails,
either to facilitate the user interaction with an email repository, or to improve other email processing
tasks. Experiments on the Enron corpus (see Chapter 2 ) show that these keyword summaries can
support more effective automatic email foldering as well as the prediction of an email intended
recipients.
Current and Future Trends in Topic Modeling for Text Conversations What are the current open
issues in topic modeling for conversations? We expect much more work on asynchronous conversa-
tions, including emails and blogs, with particular interest in two questions.
Search WWH ::




Custom Search