Evolving Explanatory Novel Patterns for Semantically-Based Text Mining - Natural Language Processing and Text Mining

Information Technology Reference

In-Depth Information

produces an intermediate representation called “templates” in which information

relevant has been recognised, for example: names, events, entities, etc., or high-level

linguistic entities: noun phrases, etc.

Using IE techniques and electronic linguistic resources, Hearst [19] proposes a

domain-independent method for the automatic discovery of WordNet-style lexicose-

mantic relations by searching for corresponding lexicosyntactical patterns in unre-

stricted text collections. This technique is meant to be useful as an automated or

semi-automated aid for lexicographers and builders of domain-dependent knowledge

bases. Also, it does not require an additional knowledge base or specific interpreta-

tion procedures in order to propose new instances of WordNet relations [9]. Once

the basic relations (i.e., hyponyms, hypernyms, etc.) are obtained, they are used to

find common links with other “similar” concepts in WordNet [9] and so to discover

new semantic links [18]. However, there are tasks which need to be performed by

hand such as deciding on a lexical relation that is of interest (i.e., hyponym) and a

list of word pairs from WordNet this relation is known to hold between.

One of the main advantages of this method is its low cost for augmenting the

structure of WordNet and its simplicity of relations. However, it also has some

drawbacks including its dependence on the structure of a general-purpose ontology

which prevents it from reasoning about specific terminology/concepts, the restricted

set of defined semantic relations (i.e., only relations contained in WordNet are dealt

with), its dependence on WordNet's terms (i.e., only terms present in WordNet can

be related and any novel domain-specific term will be missed), the kind of inference

enabled (i.e., it is only possible to produce direct links; what if we wish to relate

different terms which are not in WordNet?), etc.

A natural further important step would be using knowledge base such as Word-

Net to support text inference to extract relevant, unstated information from the text.

Harabagiu and Moldovan [15] address this issue by using WordNet as a commonsense

knowledge base and designing relation-driven inference mechanisms which look for

common semantic paths in order to draw conclusions. One outstanding feature of

their method is that from these generated inferences, it is easy to ask for unknown

relations between concepts. This has proven to be extremely useful in the context of

Question-Answering Systems. However, although the method exhibits understand-

ing capabilities, the commonsense facts discovered have not been demonstrated to

be novel and interesting from a KDD viewpoint.

Mooney and colleagues [25] have also attempted to bring together general on-

tologies, IE technology and traditional machine learning methods to mine interesting

patterns. Unlike previous approaches, Mooney deals with a different kind of knowl-

edge, e.g., prediction rules. In addition, an explicit measure of novelty of the mined

rules is proposed by establishing semantic distances between rules' antecedents and

consequents using the underlying organisation of WordNet. Novelty is then defined

as the average (semantic) distance between the words in a rule's antecedent and con-

sequent. A key problem with this is that the method depends highly on WordNet's

organisation and idiosyncratic features. As a consequence, since a lot of information

extracted from the documents are not included in WordNet the predicted rules will

lead to misleading decisions on their novelty.

The discussed approaches to TM/KDT use a variety of different “learning” tech-

niques. Except for cases using Machine Learning techniques such as Neural Networks

(e.g., SOM), decision trees, and so on, which have also been used in traditional DM,

the real role of “learning” in the systems is not clear. There is no learning which

Natural Language Processing and Text Mining

Search WWH ::

Custom Search

Home