Information Technology Reference
In-Depth Information
metaphor M . Moreover, [ 23 ] also shows that copula metaphors of the form Tisan
M in the Google n-grams—the origins of srcT ypical
(
)
—are broadly consistent
with the properties and affective profile of each stereotype T . In 87% of cases, one
can correctly assign the label positive or negative to a topic T using only the contents
of srcT ypical
T
, provided it is not empty.
Stereotrope derives its appreciation of feelings from its understanding of how
one property presupposes another. The intuition that two properties X and Y linked
via the pattern “ as X and Y as ” evoke similar feelings is supported by the strong
correlation (0.7) observed between the positivity of X and of Y over the many X
(
T
)
/
Y
pairs that are harvested from the Web using this acquisition pattern.
The “ fact ” that bats lay eggs can be found over 40,000 times on the web via
Google. On closer examination, dubious matches often form part of a larger question
such as “ do bats lay eggs? ”, while the question “ why do bats lay eggs? ” has zero
matches. So “ why do ” questions provide an effective superstructure for acquiring
normative facts from the Web: they identify facts that are commonly presupposed,
and thus stereotypical, and clearlymark the start and end of each presupposition. Such
questions also yield useful facts: the authors of [ 22 ] show that when these facts are
treated as features of the stereotypes for which they are presupposed, they provide
an excellent basis for classifying different stereotypes into the same ontological
categories, as would be predicted by an ontology such as Wo rd N e t [ 9 ]. Moreover,
these features can be reliably distributed to close semantic neighbors to overcome
the problem of knowledge sparsity. The authors of [ 22 ] also demonstrate that the
likelihood that a feature of stereotype A can also be assumed of stereotype B is a
clear function of theWordNet similarity of A and B . While this is an intuitive finding,
it would not hold at all if not for the fact that these features are truly meaningful for
A and for B .
The problem posed by “ bats lay eggs ” is one faced by any system that does not
perceive the whole context of an utterance. As such, it is a problem that plagues
the use of n-gram models of Web content, such as Google's n-grams. Stereotrope
uses n-grams to suggest insightful connections between two properties or ideas, but if
many of these n-grams are mere noise, not even the Keats heuristic can disguise them
as meaningful signals. Our focus is on relational n-grams, of a kind that suggests
deep albeit tacit relationships between two concepts. These n-grams obey the pattern
X
is
a linking phrase, such as a verb, a preposition, a coordinator, etc. To determine the
quality of these n-grams, and to assess the likelihood of extracting genuine relational
insights from them, we use this large subset of the Google n-grams as a corpus for
estimating the relational similarity of the 353 word pairs in the WordSim-353 data
set [ 10 ]. We estimate the relatedness of two words X and Y as the PMI (pointwise
mutual information score) of X and Y , using the relational n-grams as a corpus
for occurrence and co-occurrence frequencies of X and Y . A correlation of 0.61 is
observed between these PMI scores and the human ratings reported in [ 10 ]. Though
this is not the highest score achieved for this task, it is considerably higher than any
that has been reported for approaches that use WordNet alone. The point here is that
<
relation
>
Y , where X and Y are adjectives or nouns and
<
relation
>
Search WWH ::




Custom Search