Information Technology Reference
In-Depth Information
LCM
MSW
Basis
Atom
E ci
Characteristics of the
unit of processing
Cannot be sliced
Can be sliced
Represents the meaning
of a word
Represents morphosyntactic
characteristics of a sentence
Fixed
Can evolve
Hand made
Automatically extracted
Rules
Atom generation rule
Morphosyntactic rules
Semantic rules
Clustering filters
Semantic
Directly manipulated by
term definitions
Indirectly manipulated by
clustering filtering and
morphosyntactic context
Table 3. Comparison Between the LCM and the MSW
In many approaches, they are manipulated as an undifferentiated bundle divided only into
nominal and verbal atoms. The following section describes elements of the morphosyntactic
approach.
2.4.1 Detecting language tendencies
Language tendencies denote cultural characteristics, which are represented as dialects and
regional practices. Noyer (Noyer, 1992) described a hierarchical tree organization defined by
applying manually predefined morphological feature filters to manage morphological
contrasts. They used this organization as an indicator of linguistic tendencies in language
usage. Extensions of this approach attempt to derive the geometry of morphological
features 2 (Harley, 1994, 1998), with the goal of classifying features into subgroups based on
an universal geometry while accounting for universals in feature distribution and
realization. In MLW, the structure of the information is organized in a general oriented
graph (E ci structure) for only the smallest unit of processing (a sentence), and a hierarchy is
defined by a chained sequence of clustering filters (Hisgen, 2010). Language tendencies are
therefore visible in the configuration of a current graph.
2.4.2 Sentence generation
Morphosyntax has also been used to implement a language sentence generator. In an earlier
study (Martínez López, 2007), Spanish adverbial phrases were analyzed to extract the
reusable structures and discard the remainder, with the goal of using the reusable subset to
generate new phrases. Interestingly, the shortest, simplest structures presented the most
productive patterns and represented 45% of the corpus.
Another study (López De Luise, 2007) suggested translating Spanish text, represented by
sets of E ci , into a graphic representing the main structure of the content. This structure was
2 This is a well-known method that is used to model phonological features (Clements, 1985) (Sagey,
1986)
Search WWH ::




Custom Search