Digital Signal Processing Reference
In-Depth Information
These raw energies per semi-tone interval can then be re-grouped by summing up
energies of octaves of a semi-tone to reduce the feature vector size to, e.g., 36, 24, or
12 bands. This will be now exemplified for the most frequently encountered choice
of 12 dimensional 'CHROMA' features.
6.2.2.2 CHROMA
Rather than storing and analysing each individual musical semi-tone's energy for
analysis of the chordal structure (a musical chord is defined as two or more simul-
taneously played notes) or the key, the feature vector x can be reduced to a limited
number of octaves up to a single one, i.e., 12 features, as for CHROMA features [ 48 ].
This may be performed by addition of all bands belonging to the same semi-tone in
different octaves. Finally the vector x is normalised by the number of merged bands.
A 12 dimensional CHROMA vector x thus provides the cumulative spectral energies
per semitone A
,
A #
,...,
G # over all octaves:
= A
G # T
,
A #
,
B
,
C
,
C #
,
D
,
D #
,
E
,
F
,
F #
,
G
,
x
(6.67)
by adding up—as a final step to the previous PCP calculation—all sub-bands corre-
sponding to the same relative pitch class.
In some implementations the length of the CHROMA vector is normalised to
1 in order to have energy independent CHROMA information. This is, however,
problematic for low energy signals, as the noise (e.g., quantisation noise) present
in this signal will dominate the CHROMA features instead of the desired harmonic
information. To avoid this problem, the CHROMA values can be forced to 0, if the
energy of the signal falls below a chosen threshold.
6.2.2.3 CENS
CHROMA-features provide only short-time information for an individual frame of
analysis. CENS (CHROMA Energy-distribution Normalised Statistics) features are
suggested in [ 49 , 50 ] to provide a perspective beyond individual frames. The under-
lying principle resembles averaging CHROMA features over time. Yet, differing
from a sheer prolongation of window-size, quantisation and temporal weighting of
harmonic information are better modelled. As the local chroma features may be too
sensitive concerning articulation effects and local tempo deviations, to each compo-
nent of x
= (
x 1 ,...,
x 12 )
a quantisation function Q defined as
 
Search WWH ::




Custom Search