Digital Signal Processing Reference
In-Depth Information
decrease of the cost function. 3
We also notice that p
(β)
1 for any
β ∈ R
with
equality iff 1
β
2. As a result, we may take a unit exponent step size p
(β)
for
0
1, corresponding to the heuristic updates, without compromising the cost
monotonicity. This is akin to over-relaxation and produces larger steps, thus reducing
their number and fastening convergence.
Concerning its applications, NMF with the beta-divergence has proved its rel-
evancy for audio off-line systems in speech analysis [ 15 ], source separation [ 16 ],
music transcription [ 17 ], and non-stationarity modeling with a parametric model of
the spectral templates [ 18 ] or a source-filter model for time-varying activations [ 19 ].
The scaling property in ( 14.19 ) may give an insight in understanding the relevancy
of the beta-divergence in this context.
As remarked in [ 32 ], the Itakura-Saito divergence for
β <
-
divergence to be scale-invariant. This means that the corresponding NMF problem
gives the same relative weight to all coefficients, and thus penalizes equally a bad fit
of factorization for small and large coefficients. For other values of
β =
0 is the only
β
, however, the
scaling property implies that a different emphasis is put on the coefficients depending
on their magnitude. When
β
β >
0, more emphasis is put on the higher magnitude
coefficients, and the emphasis augments with
β
β <
. When
0, the effect is the
converse.
Considering audio signals, this amounts to giving different importance to high and
low-energy frequency components. In a context of polyphonic music decomposition,
we try to reconstruct an incoming signal by addition of note templates. In order to
avoid common octave and harmonic errors, a good reconstruction would have to
find a compromise between focusing on the fundamental frequency, the first partials
and higher partials. This compromise should also be achieved in an adaptable way,
independent of the fundamental frequency, similarly to a compression rather than
a global weighting of the different components. The parameter
can thus help to
control this trade-off. A similar interpretation holds in a general audio decomposition
problem where the decomposition should find a compromise between the high and
low-energy frequency components.
Last but not least, we notice that in the literature, there is in general no rigorous
consideration on the domain of the
β
β
-divergence which is usually defined for any
β ∈ R
as in ( 14.15 )butforany x
,
y
∈ R + instead of
R ++ . This is nonetheless only
possible for
β >
1 so that the problem in ( 14.21 ) is not actually rigorously posed
for
1, attention must be paid in the multiplicative
updates as soon as zero values are allowed. In the best case, a zero value in a factor
remains zero as it is updated, but null coefficients may also introduce divisions by
zero. As a result, most of the nice properties such as monotonic decrease of the cost
and convergence may break down and algorithms may become unstable as soon as
zero values are allowed. Such considerations are important for a real-time application
where a stable behavior with no unpredictable errors is mandatory. We thus try in the
β
1. Moreover, even when
β >
3 Results in [ 21 ] may again prove a posteriori the cost monotonicity for certain heuristic multiplica-
tive updates employed in the literature.
Search WWH ::




Custom Search