Robotics Reference
In-Depth Information
How Computers Write Non-Fiction
A different application for text generation is the writing of non-fiction.
The Natural Language Processing Group at Columbia University's De-
partment of Computer Science is one of the leading research establish-
ments in this field, having developed systems that take information from
several source documents and then generate a summary containing the
most important information from the source material.
One of the tasks in which the Columbia group has been successful
is producing biographical summaries of people described in the news,
using a system called Bio-Gen that was developed in collaboration with
the Mitre Corporation. Each of the source documents is analysed to
identify certain readily identifiable linguistic constructions, for example,
Presidential candidate John Kerry
Kerry, the presidential candidate
and
Senator Kerry who is running for president this Fall
all of which capture and reinforce Kerry's presidential aspirations. In this
way several different types of descriptive information can be captured
about the people named in the source documents, such as their ages, their
professions and perhaps the roles that they played in past events. The
software also identifies whole sentences in which someone is the subject
of the sentence, providing additional well-structured information about
that person. The result of this process is that, for each named person in
the source documents, the software has compiled a set of well-described
facts.
After pruning out erroneous and duplicated descriptions of a person,
the system merges similar but not identical descriptions, for example,
“Chairman of the Budget Committee” and “Budget Committee Chair-
man” are recognised as meaning the same thing. The system then needs
to decide which of the descriptive facts about a person are the most im-
portant to include in its biographical summary, a process based on the
frequency with which a fact or description is found in the source mater-
ial. When fed with several documents about President George W. Bush,
for example, the description “President” is likely to be the one that ap-
pears most often and so the system will assume that his presidency it is
the most important fact known about him. The system is also able to
perform a similar task on relative clauses in the source material. It counts
Search WWH ::




Custom Search