Digital Signal Processing Reference
In-Depth Information
Table 7.1 Conditions of a 7-level CMOS test
Score
Statement
2 3
A is much worse than B
2 2
A is worse than B
2 1
A is slightly worse than B
0
A and B are about the same
1
A is slightly better than B
2
A is better than B
3
A is much better than B
of this section we will focus here on the evaluation of the progress during the design of
a bandwidth extension algorithm. Thus, we will compare not only the enhanced and
the original (bandlimited) signals but also signals that have been enhanced by two
different algorithmic versions.
If untrained listeners perform the test it is most reliable to perform comparison rat-
ings (CMOS tests). Usually about 10 to 30 people of different age and gender partici-
pate in a CMOS test. In a seven-level CMOS test the subjects are asked to compare the
quality of two signals (pairs of bandlimited and extended signals) by choosing one of
the statements listed in Table 7.1.
This is done for both versions of the bandwidth extension system. Furthermore, all
participants were asked whether they prefer version A or version B of the bandwidth
extension. For the latter question no “equal-quality” statement should be offered to the
subjects since in this case the statistical significance would be reduced drastically if the
subjective differences between the two versions are only small.
As an example for such a subjective evaluation we will present a comparison
between two bandwidth extension schemes which differ only in the method utilized
for estimating the vocal tract transfer function. For the first version a neural network
as described in Section 7.5.2 has been implemented, the second method is based on
a codebook approach as presented in Section 7.5.2. Two questions were of interest.
Do the extension schemes enhance the speech quality?
Which of both methods produces the better results?
Before we present the results of the evaluation some general comments are made.
An interesting fact is that both the codebook as well as the neural network are not
representing the higher frequencies appropriately where the behavior of the neural
network is even worse. At least the power of the higher frequencies produced by
the neural network and the codebook is mostly less than the power of the original
signal so that bothersome artifacts do not attract attention to a certain degree.
Further results are presented in [14].
The participants rated the signals of the network approach with an average mark
of 0.53 (between equal and slightly better than the bandlimited signals) and the
signals resulting from the codebook scheme with 1.51 (between slightly better and
 
Search WWH ::




Custom Search