Information Technology Reference
In-Depth Information
Love and Up Jumped Spring ) performed by three
different professional saxophonists. Each piece
was performed at two different tempos. For each
note in the training data, its perceptual features
and contextual features were computed.
Discussion
The difference between the results obtained in the
case study and the accuracy of a baseline classi-
fier, that is, a classifier guessing at random (33%
in the case of the three-interpreter classification
task) indicates that the perceptual and contextual
features presented contain sufficient information
to identify the studied set of interpreters, and
that the machine learning methods explored are
capable of learning performance patterns that
distinguish these interpreters. It is worth noting
that every learning algorithm investigated (deci-
sion trees, SVM, ANN, k-NN and the reported
ensemble methods) produced significantly better
than random classification accuracies. This sup-
ports our statement about the feasibility of training
successful classifiers for the case study reported.
However, note that this does not necessary imply
that it is feasible to train classifiers for arbitrary
performers.
We have selected three types of musical
segment lengths: 1-note segments, short-phrase
segments (4-12 notes), and long-phrase segment
(30-62 notes). As expected, evaluation using
1-note segments results in poor classification
accuracies, while short-phrase segments and long-
phrase segment evaluation results in accuracies
well above the accuracy of a baseline classifier.
Interestingly, there is no substantial difference in
the accuracies for short-phrase sand long-phrase
segment evaluation which seems to indicate that
in order to identify a particular performer it is
sufficient to consider a short phrase segment
of the piece, that is, the identification accuracy
does not increase substantially by considering a
longer segment.
Results
There were a total of 792 notes available for each
interpreter. We segmented each of the performed
pieces in phases and obtain a total of 120 short
phrases and 32 long phrases for each interpreter.
The length of the obtained phrases and long
phrases ranged from 5 to 12 notes and 40 to 62
notes, respectively. The expected classification
accuracy of the default classifier (one which
chooses randomly one of the three interpreters)
is 33% (measured in correctly classified instances
percentage). In the short phrase case, the average
accuracy and the accuracy obtained for the most
successful trained classifier was 97.03% and
98.42%, respectively. In the short long case, the
average accuracy and the accuracy obtained for the
most successful trained classifier was 96.77% and
98.07%, respectively. For comparative purposes,
we have also experimented with 1-note phrases
and have obtained poor results. The accuracy of
the most successful classifier is 44.92%. This is
an expected result since one note is clearly not
sufficient for discriminating among interpreters.
However, the results for short and long phrases
are statistically significant which indicates that
it is indeed feasible to train successful classifiers
to identify interpreters from their playing style
using the considered perceptual and contextual
features. It must be noted that the performances
in our training data were recorded in a controlled
environment in which the gain level was constant
for every interpreter. Some of the features (e.g.,
attack level) included in the perceptual description
of the notes take advantage of this property and
provide very useful information in the learning
process.
future trends
Given the capabilities of current audio analysis
systems, we believe expressive-content-based
performer identification is a promising research
Search WWH ::




Custom Search