Information Technology Reference
In-Depth Information
between the nouns is deleted and a shortcut is formed by juxtaposing the two nouns
to form a compound noun. However, for interpretation of the compound noun the se-
mantic relation is expected to be reconstructed by the consumer ([17]). This process of
compound noun generation has been described as predicate deletion in literature. The
framework proposed in this paper is based on the premise that associative anaphora us-
age is a similar natural language phenomenon to compound noun generation. They both
involve two nouns connected by a relation, but the relation is not explicitly expressed by
the producer, rather, it is expected to be deduced by the consumer. The difference is that,
in the case of anaphora, the two nouns are used separately as anaphor and antecedent,
while in the case of generation of compound nouns, the two nouns are juxtaposed to-
gether as a compound noun. Research on the generation of compound nouns is at an
advanced stage with various theories existing on how compound nouns are formed. Ac-
cording to these theories, formation of NPs is not totally unconstrained, in other words,
a compound noun cannot be formed with any two random nouns. For example, war
man can not be formed on the basis of the relation “man who hates war” or similarly
house tree can not be formed from “tree between two houses” [29]. In both the exam-
ples there does exist a relation between the nouns, however it is of the type that can be
used to form a compound NP. Linguistic studies on compound nouns (eg. [6,29,17,28]
have assumed that the set of generic relations are finite and characterizable, although
the set is not necessarily common among all the studies. Studies such as [17] and [6]
have attempted to identify these relations, and even though the exact set of relations
proposed by the different studies are slightly different, a core set is very similar. An
additional aspect highlighted in [6] is that compound nouns can also be formed from
“temporary or fortuitous” relations, hence it presents a case for existence of unbounded
number of relations although the vast majority of compound nouns fit into a relatively
small set of categories [26].
The relational frameworks used in computational linguistics vary along similar lines
as those proposed by linguists. Some works in the computational linguistics (eg. [4,20])
assume the existence of an unbounded number of relations while others (eg. [16,13])
use categories similar to Levi's finite set. Yet others (eg. [22,14]) are somewhat similar
to [28]. Most of the research to date has been domain independent, done on generic
corpus such as Penn Tree Bank, British National Corpus or the web.
The later works on noun compounds have followed on from either [18] or [28] with
some of them coming up with a slightly different variation while others have defined a
finer grained set of relations dictated by the data sets used for the study. For example,
[26] reports a set of 43 relations grouped into 10 upper level categories. Most of the
relations from different studies can be mapped to an equivalent relation in other studies.
For this study we chose the set of relations proposed in [18] for two reasons. Firstly,
our analysis of corpus for anaphor-antecedent relations seemed to map better to Levi's
set of nine relations for compound nouns and secondly more of these relations can
be computationally determined from existing lexicons such as WordNet and the Web.
There are already several works that extract Levi's set of relations from WordNet and
the Web with various levels of success. In terms of natural language processing, a lin-
guistic theory is only useful if it can be reasonably implemented in a computational
system. The theory on anaphora proposed in this paper can be easily implemented by
 
Search WWH ::




Custom Search