Digital Signal Processing Reference
In-Depth Information
Fig. 9.7
AR-SLDS as DBN
s
t-2
s
t-1
s
t-3
s
t
structure
h
t-2
h
t-3
h
t-1
h
t
v
t-3
v
t-2
v
t-1
v
t
η
t
is not modelling an independent additive noise
source. For the determination of the observed sample at time step
t
, the vector
h
t
in the case of the SAR-HMM,
is
projected onto a scalar
v
t
:
B h
t
+
η
t
η
t
∼
N(η
t
2
v
t
=
,
with
;
0
,σ
V
),
(9.29)
η
t
models independent additive white Gaussian noise (AWGN) assumed to
modify the hidden clean sample
Bh
t
. The DBN structure of the SLDS that models
the hidden clean signal and an independent additive noise is found in Fig.
9.7
.
The parameters
A
where
(
s
t
)
,
B
and
Σ
H
(
s
t
)
of the SLDS can be chosen to mimic the
SAR-HMM (cf. Sect.
9.3.3.1
) for the case
0 a noise
model is included but no training of a new model is needed. With determination of the
exact parameters of the AR-SLDS having a complexity of
σ
V
=
0[
17
]. Likewise, if
σ
V
=
S
T
, the Expectation
Correction (EC) approximation [
54
] provides an elegant reduction to
O(
)
.
In practice, the AR-SLDS is particularly suited to cope with white noise dis-
turbance, as the variable
O(
T
)
η
t
incorporates an AWGN model. It is, however, usually
inferior to frame-level feature-based HMM approaches in clean conditions. This may
be explained by the difference of the approach to human perception which is not per-
formed in the time-domain. In coloured noise environment the AR-SLDS usually
also leads to lower performance than frame-level feature modelling as by SLDMs.
A limitation for practical use is the high computational requirement, even with the
EC algorithm: As an example, for audio at 16 kHz,
T
is 160 times higher than for a
feature vector sequence operated on 100 FPS.
Obviously, further model architectures exist that were not shown here, but are well
suited to cope with noises, in particular also for non-stationary noise. An example
are the LSTM networks as shown in Sect.
7.2.3.4
[
55
,
56
].