Morphosyntactic Linguistic Wavelets for Knowledge Management - Intelligent Systems

Information Technology Reference

In-Depth Information

-

Extract morphosyntactic descriptors (numeric values automatically extracted, such as

the number of vowels) for each word processed. Words were previously represented by

Porter's Stemming, but this tool does not have enough classification power for use as a

sole instrument. Morphosyntactic descriptors are required to process text with

sufficient confidence levels (López De Luise, 2007d).

-

Collapse syntagmas into a condensed internal representation (usually, selected

morphemes 6 ). The resulting representation is called EBH (Estructura Básica

Homogénea, uniform basic structure). EBHs are linked with specific connectors.

-

Calculate and set the morphosyntactic weighting p o for E ci .

More details of each of these steps are outside of the scope of this chapter (but see (López De

Luise, 2008c) and (López De Luise, 2008)).

3.2.3 Apply filtering using the most suitable approach

Since knowledge management depends on previous language experiences, filtering is

dynamic process that adapts itself to current cognitive capabilities. Furthermore, as shown

in the Case Study section, filtering is a very sensitive step in the MLW transformation.

Filtering is a process composed of several filters. The current paper includes the following

three clustering algorithms: Simple K-means, Farthest First and Expectation Maximization

(Witten, 2005). They are applied sequentially for each new E ce . When an E ce is “mature”, the

filter no longer changes.

The distance used to evaluate clustering is based on the similarity between the descriptor

values and the internal morphosyntactic metric, p o , that weights EBH (representing

morphemes). It has been shown that clusters generated with p o represent consistent word

agglomerations (López De Luise, 2008, 2008b). Although this chapter does not use fuzzy

clustering algorithms, it is important to note that such filters require a specific adaptation for

distance using the categorical metrics defined in (López De Luise, 2007e).

3.2.4 If “Abstraction” granularity and details are inadequate for the current problem

Granularity is determined by the ability to discriminate the topic and by the degree of detail

required to represent the E ci . In the MLW context it is the logic distance between the current

E ci and the E ce partitions 7 (see Figure 5). This distance depends on the desired learning

approach. In the example included herein (Section 4), it is the number of elements in the E ci

that fall within each E ce partition. The distribution of EBHs determines whether a new E ce is

a necessary. When the EBHs are too irregular, a new E ce is built per step 3.2.4.1. Otherwise

the new E ci is added to the partition that is the best match.

3.2.4.1 Insert a new filter, E ce , in the knowledge organization

The current E ce is cleaned so that it keeps all the E ci s that best match its partitions, and a new

E ce that includes all the E ci s that are not well represented is created and linked.

6 A meaningful linguistic unit that cannot be divided into smaller meaningful parts.

7 Partition in this context is a cluster obtained after the filtering process.

Search WWH ::

Custom Search

Home