Information Technology Reference
In-Depth Information
2.4.5 Improvement of translation quality/performance
Automatic translation has an important evolution. Translation quality depends on proper
pairing or alignment of sources and on appropriate targeting of languages. This sensible
processing be improved using morphosyntactic tools.
Hwang used morphosyntactics intensively for three kinds of language (Hwang, 2005). The
pairs were matched on the basis of morphosyntactical similarities or differences. They
investigated the effects of morphosyntactical information such as base form, part-of-speech,
and the relative positional information of a word in a statistical machine translation
framework. They built word and class-based language models by manipulating
morphological and relative positional information.
They used the language pairs Japanese-Korean (languages with same word order and high
inflection/agglutination 4 ), English-Korean (a highly inflecting and agglutinating language
with partial free word order and an inflecting language with rigid word order), and
Chinese-Korean, (a highly inflecting and agglutinating language with partially free word
order and a non-inflectional language with rigid word order).
According to the language pairing and the direction of translation, different combinations of
morphosyntactic information most strongly improve translation quality. In all cases,
however, using morphosyntactic information in the target language optimized translation
efficacy. Language models based on morphosyntactic information effectively improved
performance. E ci is an important part of the MLW, and it has inbuilt morphophonemic
descriptors that contribute significantly to this task.
2.4.6 Speech recognition
Speech recognition requires real-time speech detection. This is problematic when
modeling languages that are highly inflectional but can be achieved by decomposing
words into stems and endings and storing these word subunits (morphemes) separately in
the vocabulary. An enhanced morpheme-based language model has been designed for the
inflectional Dravidian language Tamil (Saraswathi, 2007). This enhanced, morpheme-
based language model was trained on the decomposed corpus. The results were
compared with word-based bi-gram and trigram language models, a distance-based
language model, a dependency-based language model and a class-based language model.
The proposed model improves the performance of the Tamil speech recognition system
relative to the word-based language models. The MLW approach is based on a similar
decomposition into stems and endings, but it includes additional morphosyntactical
features that are processed with the same importance as full words (for more information,
see the last sections). Thus, we expect that this approach will be suitable for processing
highly inflectional languages.
4 This term was introduced by Wilhelm von Humboldt in 1836 to classify languages from a
morphological point of view. An agglutinative language is a language that uses agglutination
extensively: most words are formed by joining morphemes together. A morpheme is the smallest
component of a word or other linguistic unit that has semantic meaning.
Search WWH ::




Custom Search