Information Technology Reference
In-Depth Information
used statistics and numerical learning. What we have seen more recently is a
rise in hybrid approaches, which accommodates both symbolic and numerical
learning, for all the modules in an MMD system. The NLP modules largely
follow this approach [TEL 09], as do modules falling within the field of
pragmatics. Thus, the resolution of object reference now calls on statistics
[FUN 12], as well as on semantic and discursive representation models
[STO 10], on the common ground and grounding process [ROS 10], the
speaker intention recognition [RUI 12], the allocation of mental states, the
dialogue management and automatic answer generation [RIE 11]. We can see
that one cannot make an impasse on machine learning in MMD anymore.
2.2. Linguistic aspects
An MMD system manipulates language; it is thus mainly concerned by
linguistics in general, and notably by automatic linguistics, also known as
NLP but not exclusively: the recent advances in the corpus linguistics and
what goes with it, computational linguistics have changed our way of
analyzing human dialogues. We have defined the word corpus as a collection
of averred linguistic material. More precisely, a corpus is built following strict
rules, so as to obtain a sample of the language chosen according to explicit
criteria. Depending on the type of dialogue, it can be audio or written
samples. For the oral dialogue, studying recordings has never been very
practical, and the corpus is often enriched with a transcription, i.e. a textual
approximation of pronounced utterances. The resulting text is sometimes hard
to understand without a transcription of the prosody, for example pauses, and
it is needed to add codes or annotations, i.e. data related to textual units, for
example words. We then have an annotated corpus, which has both sentences
and observations on their uttering. When you carry out morphological and
syntactic analyses, it is now usual to annotate the corpus with a set of
morphosyntactic labels and syntactic trees, or more simply, relations between
words and groups of words. In the end, a corpus can take the shape of a
computational database with many fields and multiple exploration and
interrogation possibilities. For that is the whole point of corpora: to create a
set of analyses so as to carry out frequency calculations, more complex
descriptive statistics such as factorial correspondence analyses, correlation
research between text and annotations, or even inferential statistics such as
variance analysis, and in that case the goal is to try and generalize the results
obtained from a specific corpus. This disgression on corpus linguistics allows
Search WWH ::




Custom Search