Real-Time Detection of Overlapping Sound Events with Non-Negative Matrix Factorization - Matrix Information Geometry

Digital Signal Processing Reference

In-Depth Information

In addition, other representation front-ends could be employed instead of a sim-

ple magnitude spectrum. For the task of polyphonic music transcription, considering

non-linear frequency scales (e.g., constant-Q transform) may improve the system. In

a more general setup, we would like also to address the use of a wavelet transform,

maybe coupled with a modulation spectrum representation, to provide a multi-scale

analysis of the spectro-temporal features of the sounds. The extension of NMF to

tensors may also enhance the system, allowing for instance to use multi-channel

information in the representation. We have also extended the proposed sparse algo-

rithm to deal with complex representations. This extension has not been discussed

in the paper, but can help to consider more informative representations that account

for phase information.

We would like finally to improve further the robustness and the generalization

capacity of the system. Concerning robustness, a first direction may be to model infor-

mation from the encoding coefficients during template learning to improve detection

during decomposition. We could alternatively investigate the use of non-fixed updated

basis vectors to absorb noise and other undesirable sound components. Concerning

generalization, we may enhance our model to deal with adaptive event templates.

For example, second-order cone programming may be employed to consider non-

fixed templates constrained within geometric cones. A similar idea has already been

proposed in [ 48 ] for supervised classification with NMF. Other possibilities come

from the use of a hierarchical instrument basis as in [ 39 ] or more generally from

convex NMF techniques with convergence guarantees as proposed in [ 21 ]. Future

work should address the adaptation of these approaches to the proposed algorithms.

Acknowledgments This work was partially funded by a doctoral fellowship from the UPMC

(EDITE). The authors would like to thank Chunghsin Yeh and Roland Badeau for their valuable help,

Emmanouil Benetos for his helpful comments on the paper, Valentin Emiya for kindly providing

the MAPS database, as well as Patrick Hoyer and Emmanuel Vincent for sharing their source code.

References

1. Paatero, P., Tapper, U.: Positive matrix factorization: a non-negative factor model with optimal

utilization of error estimates of data values. Environmetrics. 5 (2), 111-126 (1994)

2. Lee, D.D., Seung, H.S.: Learning the parts of objects by non-negative matrix factorization.

Nature 401 (6755), 788-791 (1999)

3. Lee, D.D., Seung, H.S.: Algorithms for non-negative matrix factorization. In: Advances in

Neural Information Processing Systems, vol. 13, pp. 556-562. MIT Press, Cambridge, (2001)

4. Sha, F., Saul, L.K.: Real-time pitch determination of one or more voices by nonnegative matrix

factorization. In: Advances in Neural Information Processing Systems, vol. 17, pp. 1233-1240.

MIT Press, Cambridge, (2005)

5. Cheng, C.-C., Hu, D.J., Saul, L.K.: Nonnegative matrix factorization for real time musical

analysis and sight-reading evaluation. In: 33rd IEEE International Conference on Acoustics,

Speech and Signal Processing, pp. 2017-2020. Las Vegas, USA (2008)

6. Paulus, J., Virtanen, T.: Drum transcription with non-negative spectrogram factorisation. In:

13th European Signal Processing Conference, Antalya, Turkey (2005)

Search WWH ::

Custom Search

Home