Databases Reference
In-Depth Information
￿ Searching for particular conversations and/or particular messages: With the phenomenal
growth in the amount of text conversations stored in computer memory comes the need for
supporting effective search. Generally speaking, the ability to mine and summarize text can
benefit any Web search. The more information that can be extracted from text, the more search
can be based on the extracted information, rather than on simple matching with the words in
the query. For instance, the reader may be very interested in browsing through all the sentences
expressing negative opinions, sentences that represent action items, or sentences that describe
decisions made. While the underlying conversational data are unstructured, the summary
sentences and the linking to original sentence essentially provide structured meta-data to access
the underlying data. Moreover, if any document can be effectively summarized, the quality of
the presentation of search query results can be improved by presenting a summary as the snippet
for each returned documents. Arguably, these advantages would also apply to a search engine
for text conversations that relies on the techniques presented in this topic. For instance, if it
was possible to extract topics and opinions from conversations, a conversational search engine
could support queries like: “what messages in the company blogs express opinions on the new
budget?”. And the output of such search query could be a list of relevant messages summarized
in the context of both the query and the conversation (see query focused summarization Section
1.4 ).
￿ Forensic/investigation: Given the permanent nature of Web-based text conversations, it is
not surprising that they have caught the attention of law enforcement organizations as sources
of evidence in their investigations. Most countries already accept emails as evidence that can be
used in court [ Gupta et al. , 2004 ]. For instance, in both the high-profile antitrust trial against
Microsoft and the famous Enron scandal investigation, emails were used as evidence in court.
In these and similar cases, the amount of data that need to be analyzed is often huge; the
Enron email dataset contains about half a million messages belonging to 150 users and stored
in 3500 folders. So the ability to mine and summarize the relevant conversations can be highly
beneficial.
￿ Analyzing large-scale trends: While many conversations are confined to a small group of
friends or colleagues, still others are so large and broad that they effectively feature hundreds
of participants making potentially thousands of comments. The growing popularity of Twitter,
in particular, has fed this tendency towards large-scale conversation. During a major event such
as the Super Bowl or a political uprising in Egypt, relevant Twitter messages (or tweets ) are sent
by the thousands or millions. Some tweets may respond directly to others, while in general the
conversation remains vast and amorphous. It is simply not feasible to read all tweets relevant to
such a topic, and so mining and summarization technologies can help provide an overview of
what people are saying and what positive or negative opinions are being expressed. Sharifi et al.
[ 2010 ] demonstrate one method of summarizing such large conversations.
Search WWH ::




Custom Search