Database Reference
In-Depth Information
has an average of 68
.
13 %, in which the SVM technique outperforms the PLSA
algorithm by 14
73 %.
It needs to be pointed out that this evaluation is based on predetermined semantic
view types, which are in favor of the SVM algorithm. It is because such a semantic
definition has become considerably involved in SVM training, while barely being
used in PLSA training. In the SVM method, labeled training data associated with
each predefined view type are indispensable for building the classifier. On the other
hand, the PLSA model training merely requires a specified number of view types,
which is similar to the number of clusters needed for training a K-means clustering.
Thus, it is anticipated that the supervised SVM method will have better performance
than the unsupervised PLSA algorithm.
However, the PLSA model is advanced in its unsupervised characteristics such
that the labeled data is not necessary in training. This feature makes the PLSA
more suitable than the SVM and significant in supporting the generic framework
dealing with large-scale datasets, where automatic processes and minimum human
and expertise interventions are essential. For evaluating our proposed framework,
a trade-off in the classification accuracy can be afforded, if the ultimate event
detection results are comparable using either the PLSA or the SVM view results.
In order to analyze the generic and scalable properties, a subset with small-scale
five-sports dataset is applied, including {soccer, basketball, volleyball, table tennis,
tennis}. The SVM and PLSA view classification performance of this small-scale
dataset is presented in the 3rd/4th columns of Fig. 9.8 , respectively. The baseline on
the small-scale data, the 14-sports, has a 0.27 % performance drop in SVM and an
improvement of 1.76 % in PLSA. With similar results, compared with the five-sport
small-scale data, the 14-sport view dataset has a lot more data in both variety and
volume.
Based on the preceding analytical results, the extrapolated performance from
this current relatively large-scale dataset to a truly large-scale dataset should be
maintained, especially for the PLSA method. The reasoning is twofold: first, large-
scale data is normally sparse; PLSA, as a generative model, has a characteristics in
probabilistically mapping data from a high-dimensional space to a low-dimensional
space. Hence, more information brought by the new data can help in finding
significant representatives in the lower dimensional space. Second, since the number
of view classes are fixed at four types, more variety and volume will not affect the
performance much.
Additionally, a knowledge transfer property is investigated by using the same
five-sport dataset. It can be seen that an individual sport from insufficient resources
{basketball, volleyball, table tennis, tennis} can be assisted by borrowing the
codebook from an abundant sport resource {soccer}. As Fig. 9.8 depicts, these
limited-source four sports in the 5th/6th columns, the codebook transfer mechanism
has improved about 2.07 % and 5.05 % for the SVM and PLSA on average, respec-
tively. The margin of improvement using the PLSA is bigger than its counterpart
in SVM. This result can be explained by the nature of two different techniques.
PLSA is a probabilistic-based dimensional reduction technique. Therefore, more
data will provide a more thorough characterization of the low-dimensional model.
.
Search WWH ::




Custom Search