Databases Reference
In-Depth Information
knowledge about the domain and which types of events are often significant in that domain. For
example, the summarizer could detect that there are many sentences in a meeting or email thread
that concern a particular action item, and the awareness that action items often form a critical part
of a summary might be part of the knowledge base available to the summarizer.
knowledge
base
More specifically, domain knowledge can be represented in an
ontology
. One popular ontology
ontology
language is OWL/RDF, widely used in semantic web contexts and based on description logics, a
subset of first order logic. An ontology typically contains a class-subclass hierarchy, properties or
relations, and instance data. For example, we may have a class
Person
and a subclass
Manager
, and
a particular instance of a manager
Heather
. We may have a property or relation
worksWith
that
connects instances of
Person
. Adding instance data to the ontology is called
populating the ontology
.
To use the language of ontological engineering, our classes and properties are defined in the T-Box
of the ontology, while the A-Box contains our instance data. We do not go into any more detail on
ontologies here, but any primer on the semantic web should suffice to give a general overview (e.g.,
[
Allemang and Hendler
,
2008
], [
Segaran et al.
,
2009
]).
An abstractive system also requires natural language generation (NLG) to create the summary
output. An NLG system is typically comprised of a
planner
to create the document structure, a
micro-
planner
to refine the document plan by doing aggregation and coreference resolution among other
tasks, and a
realizer
to generate the actual surface text.
Reiter and Dale
[
2000
] provide the classic
text on NLG systems and components.
4.2.3 OUTPUTS AND INTERFACES
Although research on automatic summarization usually concerns the generation of textual sum-
maries, summaries need not be text-based. A meeting summary could consist of concatenated audio
clips from the discussion, while any conversation could be summarized with a graphics-based visu-
alization highlighting information such as participant activity and topic dispersion.
visualiza-
tion
Even with textual summaries, many types of output are possible. One could generate well-
formed paragraphs of coherent text describing the conversation at a high level—a so-called
abstractive
summary necessitating a text generation component. A simpler approach, but one that leads to less
coherent summaries, is the
extractive
approach of simply classifying some sentences as important
and pasting them together. One could also generate a
word cloud
or a list of dates, named entities or
word
cloud
keywords. A word cloud might not seem like a summary in the traditional sense, but it does fit the
definition of condensing a document to a simple representation of its most important components.
For example, Figure
4.1
shows a word cloud representing the email conversation shown earlier.
Summary outputs and interfaces also vary according to how a summary is intended to be used.
If a summary is meant to serve as an index to browsing the original document, then it might be
situated in a browsing interface along with a variety of other search and browsing functions. Indeed, a
browing interface could feature multi-modal summary types such as an abstractive textual summary
alongside a word cloud and some visualizations of participant activity. All of these summary types
would then be linked to the original conversation record and possibly to each other.
Search WWH ::
Custom Search