Information Technology Reference
In-Depth Information
1.2 Approaches that Use NLP Techniques
The papers in our first group deal with approaches that utilize to various
degrees more in-depth NLP techniques. All of them use a parser of some
sort or another, one of them uses some morphological analysis (or rather
generation), and two of them use other lexical resources, such as WordNet,
FrameNet, or VerbNet. The first three use off-the-shelf parsers while the last
uses their own parser.
Popescu and Etzioni combine a wide array of techniques. Among these
are NLP techniques such as parsing with an off-the-shelf parser, MINIPAR,
morphological rules to generate nouns from adjectives, and WordNet (for its
synonymy and antonymy information, its IS-A hierarchy of word meanings,
and for its adjective-to-noun pertain relation). In addition, they use hand-
coded rules to extract desired relations from the structures resulting from
the parse. They also make extensive and key use of a statistical technique,
pointwise mutual information (PMI), to make sure that associations found
both in the target data and in supplementary data downloaded from the Web
are real. Another distinctive technique of theirs is that they make extensive use
of the Web as a source of both word forms and word associations. Finally, they
introduce relaxation labeling, a technique from the field of image-processing,
to the field of text mining to perform context sensitive classification of words.
Bunescu and Mooney adapt Support Vector Machines (SVMs) to a new
role in text mining, namely relation extraction, and in the process compare
the use of NLP parsing with non-NLP approaches. SVMs have been used
extensively in text mining but always to do text classification, treating a doc-
ument or piece of text as an unstructured bag of words (i.e., only what words
are in the text and what their counts are, not their position with respect to
each other or any other structural relationships among them). The process of
extracting relations between entities, as noted above, has typically been pre-
sumed to require parsing into natural language phrases. This chapter explores
two new kernels for SVMs, a subsequence kernel and a dependency path ker-
nel, to classify the relations between two entities (they assume the entities
have already been extracted by whatever means). Both of these involve using
a wholly novel set of features with an SVM classifier. The dependency path
kernel uses information from a dependency parse of the text while the subse-
quence kernel treats the text as just a string of tokens. They test these two
different approaches on two different domains and find that the value of the
dependency path kernel (and therefore of NLP parsing) depends on how well
one can expect the parser to perform on text from the target domain, which
in turn depends on how many unknown words and expressions there are in
that domain.
Mustafaraj et al. also combine parsing with statistical approaches to clas-
sification. In their case they are using an ensemble or committee of three
different classifiers which are typically used with non-NLP features but the
features they use are based on parse trees. In addition, their application re-
Search WWH ::




Custom Search