ROBUST ASR INSIDE A VEHICLE USING BLIND PROBABILISTIC BASED UNDER-DETERMINED CONVOLUTIVE MIXTURE SEPARATION TECHNIQUE - DSP for In-Vehicle and Mobile Systems

Digital Signal Processing Reference

In-Depth Information

technique are compared. In section 5, we summarize and indicate future

direction of our research in this area.

2.

UNDER-DETERMINED BLIND CONVOLUTIVE

MIXTURE SEPARATION

The method of blind source separation (BSS) attempts to estimate the

sources or inputs of a mixing system by observing the outputs of the system

without knowing how the sources were mixed together (no a priori

knowledge of the system) and what the sources are. The BSS is an important

problem and has many applications: e.g., interference free wireless

communication and robust automatic speech recognition in spoken dialogue

systems on mobile platforms. It is worth noting that in this chapter, the BSS

is applied within the framework of robust automatic speech recognition

problems. There are two cases (i) instantaneous mixture (IM) where the

mixing system has no memory and (ii) convolutive mixture where the length

of the filters that are used to represent a mixing system is greater than one.

Let N be the number of sensors used to observe the source signals and M be

the number of sources. Then the IM case can be written in matrix form as:

where x ( n ) is the mixed signal output matrix of size N x K, s ( n ) is the matrix

of source signals of size M x K, v ( n ) is the additive noise matrix of size N x K,

n = 1, 2.. .K are the time samples and a is the mixing matrix (mixing system)

of size N by M which is represented in terms of angles or directions of arrival

of source signals at the sensors i.e., a is a function of

The BSS is an easier problem to solve when N = M (finding a matrix);

several techniques have been developed. However, the BSS is a more

difficult problem to solve when N < M. In practice it is not possible to know

a priori how many sources are present (e.g., in the case of wireless

communication the sources correspond to the signals that get reflected from

various scatterers such as buildings and noise and in the case of spoken

dialogue systems they correspond to other speakers and noise) and they vary

dynamically as the environment changes and hence we will not know how

many sensors (e.g., antenna elements in the case of wireless communication

and microphones in the case of spoken dialogue system) to use so that it is

equal to the number of sources to observe the mixed signals. Therefore, BSS

when N < M has more practical applications and a more practical problem to

solve.

Search WWH ::

Custom Search

Home