Overview - Natural Language Processing and Text Mining

Information Technology Reference

In-Depth Information

1.2 Approaches that Use NLP Techniques

The papers in our first group deal with approaches that utilize to various

degrees more in-depth NLP techniques. All of them use a parser of some

sort or another, one of them uses some morphological analysis (or rather

generation), and two of them use other lexical resources, such as WordNet,

FrameNet, or VerbNet. The first three use off-the-shelf parsers while the last

uses their own parser.

Popescu and Etzioni combine a wide array of techniques. Among these

are NLP techniques such as parsing with an off-the-shelf parser, MINIPAR,

morphological rules to generate nouns from adjectives, and WordNet (for its

synonymy and antonymy information, its IS-A hierarchy of word meanings,

and for its adjective-to-noun pertain relation). In addition, they use hand-

coded rules to extract desired relations from the structures resulting from

the parse. They also make extensive and key use of a statistical technique,

pointwise mutual information (PMI), to make sure that associations found

both in the target data and in supplementary data downloaded from the Web

are real. Another distinctive technique of theirs is that they make extensive use

of the Web as a source of both word forms and word associations. Finally, they

introduce relaxation labeling, a technique from the field of image-processing,

to the field of text mining to perform context sensitive classification of words.

Bunescu and Mooney adapt Support Vector Machines (SVMs) to a new

role in text mining, namely relation extraction, and in the process compare

the use of NLP parsing with non-NLP approaches. SVMs have been used

extensively in text mining but always to do text classification, treating a doc-

ument or piece of text as an unstructured bag of words (i.e., only what words

are in the text and what their counts are, not their position with respect to

each other or any other structural relationships among them). The process of

extracting relations between entities, as noted above, has typically been pre-

sumed to require parsing into natural language phrases. This chapter explores

two new kernels for SVMs, a subsequence kernel and a dependency path ker-

nel, to classify the relations between two entities (they assume the entities

have already been extracted by whatever means). Both of these involve using

a wholly novel set of features with an SVM classifier. The dependency path

kernel uses information from a dependency parse of the text while the subse-

quence kernel treats the text as just a string of tokens. They test these two

different approaches on two different domains and find that the value of the

dependency path kernel (and therefore of NLP parsing) depends on how well

one can expect the parser to perform on text from the target domain, which

in turn depends on how many unknown words and expressions there are in

that domain.

Mustafaraj et al. also combine parsing with statistical approaches to clas-

sification. In their case they are using an ensemble or committee of three

different classifiers which are typically used with non-NLP features but the

features they use are based on parse trees. In addition, their application re-

Natural Language Processing and Text Mining

Search WWH ::

Custom Search

Home