Digital Signal Processing Reference
In-Depth Information
non-linguistic vocalisations [
51
] or the segmentation of meeting speech [
52
]. A par-
ticular strength is the possibility to use arbitrary functions for the observations without
complication of the parameter learning.
The HCRF models the conditional probability of a class
c
, given the sequence of
observations
X
=
x
1
,
x
2
,...,
x
T
:
1
e
λ
f
(
c
,
Seq
,
X
)
,
p
(
c
|
X
,λ)
=
(9.24)
z
(
X
,λ)
Seq
∈
c
where
λ
is the parameter vector and
f
the 'vector of sufficient statistics', and
Seq
=
s
1
,
s
T
is the hidden state sequence run through during the computation of
this conditional probability. The probability is normalised by the 'partition function'
z
s
2
,...,
(
X
,λ)
to ensure a properly normalised probability [
15
]:
e
λ
f
(
c
,
Seq
,
X
)
.
z
(
X
,λ)
=
(9.25)
c
Seq
∈
c
The vector
f
determines the probability to model. With a suited
f
a left-right
HMM can be imitated [
15
]. Let us now now restrict the HCRF to a Markov chain,
but without the requirements of the transition probabilities to sum to one and real
probability densities for the observations. In analogy to a HMM a parametrisation
by transition scores
a
i
,
j
and observation scores
b
j
(
x
t
)
can then be reached with the
parameters
λ
9.3.3 Audio Modelling in the Time Domain
Modelling of the raw signal in the time domain is a sparsely pursued option, but can
offer easy explicit noise modelling [
16
]. We will look at SAR-HMMs to this end
first, and then at the extension to SLDS.
9.3.3.1 Switching Autoregressive Hidden Markov Models
The SAR-HMM models the audio signal of interest as an autoregressive (AR)
process. The non-stationarity is realised by switching between different AR parame-
ter sets [
17
] by a discrete switch variable
s
t
similar to the HMM states. At a time
step
t
—referring to the sample-level in this case—, exactly one out of
S
states is
occupied. The state at time step
t
depends exclusively on its predecessor with the
transition probability
p
v
t
at this time step is assumed as a
linear combination of its
R
preceding samples superposed by a Gaussian distributed
'innovation'
(
s
t
|
s
t
−
1
)
.Thesample
η(
s
t
)
.
η(
s
t
)
and the AR weights
c
r
(
s
t
)
are the parameter set given by
the state
s
t
: