Digital Signal Processing Reference
In-Depth Information
1.
INTRODUCTION
Spoken dialog interface using spontaneous speech is one of the most
critical modules needed for effective hands-free human-machine interaction
in vehicles when the safety is in mind. To develop framework for this‚ large-
scale speech corpora play important roles for both of acoustic modelling and
speech modelling in the field of robust and natural speech interface.
The Center for Integrated Acoustic Information Research (CIAIR) at
Nagoya University has been developing a significantly large scale corpus for
in-car speech applications [1‚5‚6]. Departing from earlier studies on the
subject‚ the dynamic behaviour of the driver and the vehicle has been taken
into account as well as the content of the in-car speech. These include the
vehicle-specific data‚ driver-specific behavioural signals‚ the traffic
conditions‚ and the distance to the destination [2‚8‚9]. In this chapter‚ details
of this multimedia data collection effort will be presented. The main
objectives of this data collection are as follows:
Training acoustic models for the in-car speech data‚
Training language models of spoken dialogue for task domains
related to information access while driving a car‚ and
Modelling the communication by analyzing the interaction among
different types of multimedia data.
In our project‚ a system specially developed in a Data Collection Vehicle
(DCV) (Figure 1-1) has been used for synchronous recording of multi-
channel audio signals‚ multi-channel video data‚ and the vehicle related
information. Approximately‚ a total of 1.8 Terabytes of data has been
collected by recording several sessions of spoken dialogue for about a period
of 60-minutes drive by each of over 800 drivers. The driver gender
breakdown is equal between the male and female drivers.
All of the spoken dialogues for each trip are transcribed with detailed
information including a synchronized time stamp. We have introduced and
employed a Layered Intention Tag (LIT) for analyzing dialogue structure.
Hence‚ the data can be used for analyzing and modelling the interactions
between the navigators and drivers involved in an in-car environment both
under driving and idling conditions.
This chapter is organized as follows. In the next section‚ we describe the
multimedia data collection procedure performed using our Data Collection
Vehicle (DCV). In Section 3‚ we introduce the Layered Intention Tag (LIT)
for analysis of dialogue scenarios. Section 4 briefly describes other layers of
the corpus. Our preliminary findings are presented in Section 5.
Search WWH ::




Custom Search