Mining Text Conversations - Methods for Mining and Summarizing Text Conversations

Databases Reference

In-Depth Information

Figure 3.7: Distributions of 4S topics for two users, when compared with the overall average distributions

in Twitter. The overall distribution is shown in the middle bar, with Status less frequent than the other

three topics, which are similarly distributed. The right bar shows the topic distribution for user @oprah,

which is similar to the overall distribution. The bar on the left show the distribution for user @w3c, which

is instead substantially different from the other two, as the topic Substances dominates the distribution.

Beside the users' topic distributions, common words for each topic are shown as word clouds. Size and

shade of the words convey frequency and recency of usage, respectively.

An intuitive way to interpret this formula is that the best keywords for an email e (i.e., the

ones with higher P(c | e) ), are the ones that are highly probable (high P(c | z i ) ) in the most likely

topics for that email message (high P(z i |

e) ).

Notice that this is a straightforward application of LDA. However, it provides an interesting

example of how topic modeling can be used to perform a very basic form of summarization, as for

each email, we can compute the summary keywords that best describe the email in the context of

a topic model. These short summary descriptions can be used as substitute for the original emails,

either to facilitate the user interaction with an email repository, or to improve other email processing

tasks. Experiments on the Enron corpus (see Chapter 2 ) show that these keyword summaries can

support more effective automatic email foldering as well as the prediction of an email intended

recipients.

Current and Future Trends in Topic Modeling for Text Conversations What are the current open

issues in topic modeling for conversations? We expect much more work on asynchronous conversa-

tions, including emails and blogs, with particular interest in two questions.

Search WWH ::

Custom Search

Home