Databases Reference
In-Depth Information
knowledge about the domain and which types of events are often significant in that domain. For
example, the summarizer could detect that there are many sentences in a meeting or email thread
that concern a particular action item, and the awareness that action items often form a critical part
of a summary might be part of the knowledge base available to the summarizer.
knowledge
base
More specifically, domain knowledge can be represented in an ontology . One popular ontology
ontology
language is OWL/RDF, widely used in semantic web contexts and based on description logics, a
subset of first order logic. An ontology typically contains a class-subclass hierarchy, properties or
relations, and instance data. For example, we may have a class Person and a subclass Manager , and
a particular instance of a manager Heather . We may have a property or relation worksWith that
connects instances of Person . Adding instance data to the ontology is called populating the ontology .
To use the language of ontological engineering, our classes and properties are defined in the T-Box
of the ontology, while the A-Box contains our instance data. We do not go into any more detail on
ontologies here, but any primer on the semantic web should suffice to give a general overview (e.g.,
[ Allemang and Hendler , 2008 ], [ Segaran et al. , 2009 ]).
An abstractive system also requires natural language generation (NLG) to create the summary
output. An NLG system is typically comprised of a planner to create the document structure, a micro-
planner to refine the document plan by doing aggregation and coreference resolution among other
tasks, and a realizer to generate the actual surface text. Reiter and Dale [ 2000 ] provide the classic
text on NLG systems and components.
4.2.3 OUTPUTS AND INTERFACES
Although research on automatic summarization usually concerns the generation of textual sum-
maries, summaries need not be text-based. A meeting summary could consist of concatenated audio
clips from the discussion, while any conversation could be summarized with a graphics-based visu-
alization highlighting information such as participant activity and topic dispersion.
visualiza-
tion
Even with textual summaries, many types of output are possible. One could generate well-
formed paragraphs of coherent text describing the conversation at a high level—a so-called abstractive
summary necessitating a text generation component. A simpler approach, but one that leads to less
coherent summaries, is the extractive approach of simply classifying some sentences as important
and pasting them together. One could also generate a word cloud or a list of dates, named entities or
word
cloud
keywords. A word cloud might not seem like a summary in the traditional sense, but it does fit the
definition of condensing a document to a simple representation of its most important components.
For example, Figure 4.1 shows a word cloud representing the email conversation shown earlier.
Summary outputs and interfaces also vary according to how a summary is intended to be used.
If a summary is meant to serve as an index to browsing the original document, then it might be
situated in a browsing interface along with a variety of other search and browsing functions. Indeed, a
browing interface could feature multi-modal summary types such as an abstractive textual summary
alongside a word cloud and some visualizations of participant activity. All of these summary types
would then be linked to the original conversation record and possibly to each other.
Search WWH ::




Custom Search