Digital Signal Processing Reference
In-Depth Information
such audio. However, noise frames are assumed to be independent over time. As
a consequence, non-stationary noises are not modelled adequately. Even with the
restrictions made in the GPB algorithm, feature enhancement by SLDM is compu-
tationally more demanding than the techniques discussed above. Further, as in the
AFE (cf. Sect. 9.1 ), accurate audio activity detection is required to provide correct
estimation of the noise LDM.
9.3 Model Architectures
The most frequently used data-driven model representation of audio are HMMs
[ 14 ]. Beyond the so far described optimisation options along the chain of Intelligent
Audio Analysis, extending HMM topologies to more general DBN layouts can also
help to increase noise robustness [ 15 , 17 , 44 ]. Generative models such as HMMs
assume conditional independence of the audio feature observations, thus ignoring
long-range dependencies as given in most audio of interest [ 45 ]. To overcome this,
Conditional Random Fields (CRF) [ 46 - 48 ] model a sequence by an exponential
distribution given the observation sequence. The HCRF [ 15 , 49 ] further includes
hidden state sequences for the estimation of the conditional probability of a class
over an entire sequence. Another interesting option is to model the raw audio signal
in the time domain [ 16 ]. For example, SAR-HMM [ 16 ] provide good results in clean
audio conditions. To cope with noise, these can be extended to a Switching Linear
Dynamical System (SLDS) [ 17 ] to model the dynamics of the raw audio signal and
the noise. These alternatives will now be shortly presented.
9.3.1 Conditional Random Fields
As mentioned above, CRF [ 46 ] use an exponential distribution to model a sequence
given its observation and by that also non-local dependencies among states and
observations. Further, unnormalised transition probabilities are possible. Owing to
the ability to enforce a Markov assumption as in HMMs, dynamic programming is
applicable for inference. CRFs were also shown beneficial as LM [ 50 ].
9.3.2 Hidden Conditional Random Fields
An extension to HCRF is needed to make the CRF paradigm suited for general audio
recognition tasks. This comes, as CRF provide a class prediction per observation
and frame of a time sequence rather than for an entire sequence. HCRF overcome
this by adding hidden state sequences [ 49 ]. Reports of superiority over HMM in
the Intelligent Audio Analysis domain include the recognition of phones [ 15 ] and
 
Search WWH ::




Custom Search