Digital Signal Processing Reference
In-Depth Information
Figure 3-12 illustrates the average number of speaker and operator
utterances per task. The number suggests the complexity of the task. The
greater the number is, the more complex the task is, indicating that the task
has many conditions to consider. For example, the hotel information task has
the following conditions: 1) the location of the hotel; 2) the style of the hotel
(Western or Japanese); 3) the number of persons; 4) the room charge; 5)
facilities; 6) tourist attractions near the hotel; etc. The restaurant task has the
following conditions: 1) the location of the restaurant; 2) the type of food; 3)
price range; 4) acceptability of credit card payments; 5) parking availability;
etc. Mixed initiative dialog scenarios must be introduced to handle these
tasks in a responsive manner, since speakers do not want to answer question
on a one-by-one basis. Narrowing down the conditions to feasible values by
considering the context and the driver's preferences is also a necessity. For
example, the operator might narrow the alternatives to two or three by
considering the driving route plan and the locations of the alternatives. The
dialog patterns and tactics selected by the operator in the spoken dialog
corpus are being examined in order to design responsive HMI dialog
scenarios.
6.
CONCLUSIONS
A spoken dialog corpus for car telematics services was collected from 137
males and 113 females. Analysis of the spoken dialog corpus revealed that the
vocabulary size for speaker utterances was 4,533 words, consisting of 762
proper nouns and 3,771 words other than proper nouns. The average number
of dialog tasks per speaker was 8.1. The three most requested types of
information in the corpus were traffic information, tourist attraction
information and restaurant information. These results are being used to
develop and evaluate ASR as well as the dialog scenarios used in the
CAMMIA system.
The spoken dialog corpus has several issues which should be addressed in
the development of ASR grammars and the dialog scenario for HMIs:
(i)
The operator does not talk like a computer.
The operator uses ambiguous expressions, such as “the route is
congested a little bit heavily”.
The operator does not always state things in a succinct way.
(ii)
The speaker does not act like he is talking to a computer
Search WWH ::




Custom Search