Databases Reference
In-Depth Information
CHAPTER
1
Introduction
Before the invention of the Internet and the creation of the Web, the vast majority of human
conversations were in spoken form, with the only notable, but extremely limited, exception being
epistolary exchanges. Some important spoken conversations, such as criminal trials and political
debates (e.g., Hansard , the transcripts of parliamentary debates), have been transcribed for centuries,
but the rest of what humans have been saying to each other, throughout their history, to solve
problems, make decisions and more generally to interact socially, has been lost.
This situation has dramatically changed in the last two decades. At an accelerating pace,
people are having conversations by writing in a growing number of social media, including emails,
blogs, chats and texting on mobile phones. At the same time, the recent, rapid progress in speech
recognition technology is enabling the development of computer systems that can automatically
transcribe any spoken conversation.
The net result of this ongoing revolution is that an ever-increasing portion of human conver-
sations can be stored as text in computer memory and processed by applying Natural Language Pro-
cessing (NLP) techniques (originally developed for written monologues - e.g., newspapers, books).
This ability opens up a large space of extremely useful applications, in which critical information can
be mined from conversations, and summaries of those conversations can be effectively generated.
This is true for both organizations and individuals. For instance, managers can find the information
exchanged in conversations within a company to be extremely valuable for decision auditing. If a
decision turns out to be ill-advised, mining and summarizing the relevant conversations may help in
determining responsibility and accountability. Similarly, conversations that led to favorable decisions
could be mined and summarized to identify effective communication patterns and sources within
the company. On a more personal level, an informative summary of a conversation could play at
least two critical roles. On the one hand, the summary could greatly support a new participant to
get up to speed and join an already existing, possibly long, conversation (e.g., blog comments). On
the other hand, a summary could help someone to quickly prepare for a follow-up discussion of
a conversation she was already part of, but which occurred too long ago for her to remember the
details. Furthermore, the ability to summarize conversations will also be crucial in our increasingly
mobile world, as a long incoming message or an extensive ongoing conversations could be much
more easily inspected on a small screen in a concise, summarized form.
This topic presents a set of powerful computational methods to mine and summarize text
conversations, where a text conversation is either one that was generated in writing, or one that was
originally spoken and then automatically transcribed. Different kinds of useful information can be
mined. We will describe how to detect what topics are covered in a given text conversation, along with
Search WWH ::




Custom Search