Digital Signal Processing Reference
In-Depth Information
The speaker was instructed to talk with the operator and obtain appropriate
driving information for Tasks 1 through 3 listed above. Each task was printed
in a handout, and each speaker utilized this handout during dialogs with the
operator.
3.3
Collection of Spoken Dialog Corpus - Phase II
This section describes the improvements we introduced to the collection
of spoken dialog corpus after Phase I.
3.3.1
The Task and Instructions
Phase I collection revealed several problems in the task and instructions.
The first problem stemmed from the use of handouts. The speakers tended to
recite the texts from the handout verbatim when they initiated a dialog. For
example, suppose the handout read “Task 2: You've just arrived in Hakone.
Find a restaurant for lunch.” If the operator started the dialog by saying
“Driving information center. May I help you?”, the speaker might respond
with “Uh, I've just arrived in Hakone, Please find a restaurant for lunch.”
This phenomenon prevented us from collecting spontaneous speech samples
in some cases.
The second problem was that the predefined tasks did not adequately
encourage the speakers to pretend that they were on a real trip, such that they
failed to generate questions relevant to the tasks. As a result, the operator
sometimes had to halt the conversation and instruct the speakers regarding the
types of questions they could ask to make the dialog more realistic.
To address these initial problems, we divided the remaining 201 speakers
into 40 groups, each of which consisted of five or six speakers. Each group
was instructed to choose Hakone or Izu as their destination, and to discuss a
driving plan for an overnight trip according to their interests. After the
discussion, each speaker generated two sets of dialog tasks, A and B, relevant
to the driving plan. Set A and set B listed the questions to obtain information
required before starting the trip on the first day, and before leaving the hotel
on the second day, respectively (see Figure 3-5). The recording of the dialog
was also divided into two sessions, A and B which corresponded to the first
day and the second day, respectively.
In addition, we found that it was necessary to provide the speaker with
additional details for the task, such as the date of the trip and the travel
expense limit. Road congestion was varied based on a distinction between
weekday and weekend travel, and the operator altered the route guidance
Search WWH ::




Custom Search