Digital Signal Processing Reference
In-Depth Information
In addition, other representation front-ends could be employed instead of a sim-
ple magnitude spectrum. For the task of polyphonic music transcription, considering
non-linear frequency scales (e.g., constant-Q transform) may improve the system. In
a more general setup, we would like also to address the use of a wavelet transform,
maybe coupled with a modulation spectrum representation, to provide a multi-scale
analysis of the spectro-temporal features of the sounds. The extension of NMF to
tensors may also enhance the system, allowing for instance to use multi-channel
information in the representation. We have also extended the proposed sparse algo-
rithm to deal with complex representations. This extension has not been discussed
in the paper, but can help to consider more informative representations that account
for phase information.
We would like finally to improve further the robustness and the generalization
capacity of the system. Concerning robustness, a first direction may be to model infor-
mation from the encoding coefficients during template learning to improve detection
during decomposition. We could alternatively investigate the use of non-fixed updated
basis vectors to absorb noise and other undesirable sound components. Concerning
generalization, we may enhance our model to deal with adaptive event templates.
For example, second-order cone programming may be employed to consider non-
fixed templates constrained within geometric cones. A similar idea has already been
proposed in [ 48 ] for supervised classification with NMF. Other possibilities come
from the use of a hierarchical instrument basis as in [ 39 ] or more generally from
convex NMF techniques with convergence guarantees as proposed in [ 21 ]. Future
work should address the adaptation of these approaches to the proposed algorithms.
Acknowledgments This work was partially funded by a doctoral fellowship from the UPMC
(EDITE). The authors would like to thank Chunghsin Yeh and Roland Badeau for their valuable help,
Emmanouil Benetos for his helpful comments on the paper, Valentin Emiya for kindly providing
the MAPS database, as well as Patrick Hoyer and Emmanuel Vincent for sharing their source code.
References
1. Paatero, P., Tapper, U.: Positive matrix factorization: a non-negative factor model with optimal
utilization of error estimates of data values. Environmetrics. 5 (2), 111-126 (1994)
2. Lee, D.D., Seung, H.S.: Learning the parts of objects by non-negative matrix factorization.
Nature 401 (6755), 788-791 (1999)
3. Lee, D.D., Seung, H.S.: Algorithms for non-negative matrix factorization. In: Advances in
Neural Information Processing Systems, vol. 13, pp. 556-562. MIT Press, Cambridge, (2001)
4. Sha, F., Saul, L.K.: Real-time pitch determination of one or more voices by nonnegative matrix
factorization. In: Advances in Neural Information Processing Systems, vol. 17, pp. 1233-1240.
MIT Press, Cambridge, (2005)
5. Cheng, C.-C., Hu, D.J., Saul, L.K.: Nonnegative matrix factorization for real time musical
analysis and sight-reading evaluation. In: 33rd IEEE International Conference on Acoustics,
Speech and Signal Processing, pp. 2017-2020. Las Vegas, USA (2008)
6. Paulus, J., Virtanen, T.: Drum transcription with non-negative spectrogram factorisation. In:
13th European Signal Processing Conference, Antalya, Turkey (2005)
Search WWH ::




Custom Search