Information Technology Reference
In-Depth Information
produces an intermediate representation called “templates” in which information
relevant has been recognised, for example: names, events, entities, etc., or high-level
linguistic entities: noun phrases, etc.
Using IE techniques and electronic linguistic resources, Hearst [19] proposes a
domain-independent method for the automatic discovery of WordNet-style lexicose-
mantic relations by searching for corresponding lexicosyntactical patterns in unre-
stricted text collections. This technique is meant to be useful as an automated or
semi-automated aid for lexicographers and builders of domain-dependent knowledge
bases. Also, it does not require an additional knowledge base or specific interpreta-
tion procedures in order to propose new instances of WordNet relations [9]. Once
the basic relations (i.e., hyponyms, hypernyms, etc.) are obtained, they are used to
find common links with other “similar” concepts in WordNet [9] and so to discover
new semantic links [18]. However, there are tasks which need to be performed by
hand such as deciding on a lexical relation that is of interest (i.e., hyponym) and a
list of word pairs from WordNet this relation is known to hold between.
One of the main advantages of this method is its low cost for augmenting the
structure of WordNet and its simplicity of relations. However, it also has some
drawbacks including its dependence on the structure of a general-purpose ontology
which prevents it from reasoning about specific terminology/concepts, the restricted
set of defined semantic relations (i.e., only relations contained in WordNet are dealt
with), its dependence on WordNet's terms (i.e., only terms present in WordNet can
be related and any novel domain-specific term will be missed), the kind of inference
enabled (i.e., it is only possible to produce direct links; what if we wish to relate
different terms which are not in WordNet?), etc.
A natural further important step would be using knowledge base such as Word-
Net to support text inference to extract relevant, unstated information from the text.
Harabagiu and Moldovan [15] address this issue by using WordNet as a commonsense
knowledge base and designing relation-driven inference mechanisms which look for
common semantic paths in order to draw conclusions. One outstanding feature of
their method is that from these generated inferences, it is easy to ask for unknown
relations between concepts. This has proven to be extremely useful in the context of
Question-Answering Systems. However, although the method exhibits understand-
ing capabilities, the commonsense facts discovered have not been demonstrated to
be novel and interesting from a KDD viewpoint.
Mooney and colleagues [25] have also attempted to bring together general on-
tologies, IE technology and traditional machine learning methods to mine interesting
patterns. Unlike previous approaches, Mooney deals with a different kind of knowl-
edge, e.g., prediction rules. In addition, an explicit measure of novelty of the mined
rules is proposed by establishing semantic distances between rules' antecedents and
consequents using the underlying organisation of WordNet. Novelty is then defined
as the average (semantic) distance between the words in a rule's antecedent and con-
sequent. A key problem with this is that the method depends highly on WordNet's
organisation and idiosyncratic features. As a consequence, since a lot of information
extracted from the documents are not included in WordNet the predicted rules will
lead to misleading decisions on their novelty.
The discussed approaches to TM/KDT use a variety of different “learning” tech-
niques. Except for cases using Machine Learning techniques such as Neural Networks
(e.g., SOM), decision trees, and so on, which have also been used in traditional DM,
the real role of “learning” in the systems is not clear. There is no learning which
Search WWH ::




Custom Search