A Framework for Interpreting Bridging Anaphora - Agents and Artificial Intelligence

Information Technology Reference

In-Depth Information

between the nouns is deleted and a shortcut is formed by juxtaposing the two nouns

to form a compound noun. However, for interpretation of the compound noun the se-

mantic relation is expected to be reconstructed by the consumer ([17]). This process of

compound noun generation has been described as predicate deletion in literature. The

framework proposed in this paper is based on the premise that associative anaphora us-

age is a similar natural language phenomenon to compound noun generation. They both

involve two nouns connected by a relation, but the relation is not explicitly expressed by

the producer, rather, it is expected to be deduced by the consumer. The difference is that,

in the case of anaphora, the two nouns are used separately as anaphor and antecedent,

while in the case of generation of compound nouns, the two nouns are juxtaposed to-

gether as a compound noun. Research on the generation of compound nouns is at an

advanced stage with various theories existing on how compound nouns are formed. Ac-

cording to these theories, formation of NPs is not totally unconstrained, in other words,

a compound noun cannot be formed with any two random nouns. For example, war

man can not be formed on the basis of the relation “man who hates war” or similarly

house tree can not be formed from “tree between two houses” [29]. In both the exam-

ples there does exist a relation between the nouns, however it is of the type that can be

used to form a compound NP. Linguistic studies on compound nouns (eg. [6,29,17,28]

have assumed that the set of generic relations are finite and characterizable, although

the set is not necessarily common among all the studies. Studies such as [17] and [6]

have attempted to identify these relations, and even though the exact set of relations

proposed by the different studies are slightly different, a core set is very similar. An

additional aspect highlighted in [6] is that compound nouns can also be formed from

“temporary or fortuitous” relations, hence it presents a case for existence of unbounded

number of relations although the vast majority of compound nouns fit into a relatively

small set of categories [26].

The relational frameworks used in computational linguistics vary along similar lines

as those proposed by linguists. Some works in the computational linguistics (eg. [4,20])

assume the existence of an unbounded number of relations while others (eg. [16,13])

use categories similar to Levi's finite set. Yet others (eg. [22,14]) are somewhat similar

to [28]. Most of the research to date has been domain independent, done on generic

corpus such as Penn Tree Bank, British National Corpus or the web.

The later works on noun compounds have followed on from either [18] or [28] with

some of them coming up with a slightly different variation while others have defined a

finer grained set of relations dictated by the data sets used for the study. For example,

[26] reports a set of 43 relations grouped into 10 upper level categories. Most of the

relations from different studies can be mapped to an equivalent relation in other studies.

For this study we chose the set of relations proposed in [18] for two reasons. Firstly,

our analysis of corpus for anaphor-antecedent relations seemed to map better to Levi's

set of nine relations for compound nouns and secondly more of these relations can

be computationally determined from existing lexicons such as WordNet and the Web.

There are already several works that extract Levi's set of relations from WordNet and

the Web with various levels of success. In terms of natural language processing, a lin-

guistic theory is only useful if it can be reasonably implemented in a computational

system. The theory on anaphora proposed in this paper can be easily implemented by

Agents and Artificial Intelligence

Search WWH ::

Custom Search

Home