Digital Signal Processing Reference
In-Depth Information
Fig. 9.7
AR-SLDS as DBN
s t-2
s t-1
s t-3
s t
structure
h t-2
h t-3
h t-1
h t
v t-3
v t-2
v t-1
v t
η t is not modelling an independent additive noise
source. For the determination of the observed sample at time step t , the vector h t
in the case of the SAR-HMM,
is
projected onto a scalar
v t :
B h t + η t
η t
N(η t
2
v t =
,
with
;
0
V ),
(9.29)
η t models independent additive white Gaussian noise (AWGN) assumed to
modify the hidden clean sample Bh t . The DBN structure of the SLDS that models
the hidden clean signal and an independent additive noise is found in Fig. 9.7 .
The parameters A
where
(
s t )
, B and
Σ H (
s t )
of the SLDS can be chosen to mimic the
SAR-HMM (cf. Sect. 9.3.3.1 ) for the case
0 a noise
model is included but no training of a new model is needed. With determination of the
exact parameters of the AR-SLDS having a complexity of
σ V =
0[ 17 ]. Likewise, if
σ V =
S T
, the Expectation
Correction (EC) approximation [ 54 ] provides an elegant reduction to
O(
)
.
In practice, the AR-SLDS is particularly suited to cope with white noise dis-
turbance, as the variable
O(
T
)
η t incorporates an AWGN model. It is, however, usually
inferior to frame-level feature-based HMM approaches in clean conditions. This may
be explained by the difference of the approach to human perception which is not per-
formed in the time-domain. In coloured noise environment the AR-SLDS usually
also leads to lower performance than frame-level feature modelling as by SLDMs.
A limitation for practical use is the high computational requirement, even with the
EC algorithm: As an example, for audio at 16 kHz, T is 160 times higher than for a
feature vector sequence operated on 100 FPS.
Obviously, further model architectures exist that were not shown here, but are well
suited to cope with noises, in particular also for non-stationary noise. An example
are the LSTM networks as shown in Sect. 7.2.3.4 [ 55 , 56 ].
 
Search WWH ::




Custom Search