Game Development Reference
In-Depth Information
Many approaches aim to identify semantics relevant to content of static images
via identification of visual features. All of these approaches involve some degree
of supervision. Duygulu and Barnard [ 22 ] employed segmentation of the image
and associated identified features within individual segments with words from
a large vocabulary. The vocabulary was used afterward to identify the semantics of
the image. Their evaluation over Corel 5K dataset yielded 70 % correct prediction.
Better results were achieved when a probabilistic model was employed by Lavrenko
et al. [ 37 ].
Feng et al. [ 25 ] proposed enhancement to the segmentation approach, which
employed the co-occurrence of terms related to images (e.g., tiger—grass occur-
ring more frequently than tiger—building), which also improved output correctness
but was more bound to the training data set of images. Improvements were also
achieved when information about global and local features were used together [ 10 ].
Various approaches use machine learning for image or image region categoriza-
tion. Techniques such as SVM [ 17 ] or Bayes point machine [ 16 ] performwell (preci-
sions over 90 % in Corel 5K dataset), but are limited to a small number of categories
and lack of training sets to be used effectively for acquisition of more specific meta-
data.
Due to its non-textual nature, metadata acquisition for image resources is often
performed via analysis of their context (e.g., in the web environment) which may
contain text or already annotated resources [ 52 , 69 , 70 ]. The acquisition of the
semantics of multimedia content (visual or aural) may also involve OCR or speech
recognition approaches [ 13 ].
Similarly to images, the raw audio resources are extensive and syntactically com-
plex. Automated acquisition of their semantics is complicated. With images, we are
usually satisfied with metadata telling us about physical features in them. The palette
of metadata types is wider comprising not only track names, authors, publishers but
also lyrics, melody, style, tonality, rhythm, motives or even mood the track evokes
on listening. For music information retrieval, the latter group is just as important
as the first group. They are used for “querying by example”, which have prolifer-
ated next to the standard textual querying [ 42 ]. Music metadata are also much more
abstract and a potential approach for their acquisition needs to perform sophisticated
interpretations of the raw music track.
Many music metadata acquisition approaches involve as a first step a transforma-
tion of raw music stream to more symbolic representation, such as musical score or
rhythm transcription. An approach of Lu and Hanjalic [ 41 ] identifies audio elements
(natural semantic sound clusters, e.g., a sequence of chords). Authors point out the
similarity of these elements to the words in texts (e.g., a sequence of tones can be
understand as a sequence of characters). Thus, the music track can be mined for key-
words , i.e. the most prominent audio elements. Still, these audio “keywords” cannot
be used as normal textual keywords (for textual query formulation). Nevertheless,
they provide a basis for effective music track comparison.
A different pre-processing technique was devised by Magistrali et al., who trans-
formed the raw music tracks to an extensive XML and then RDF files. These were
then interpreted by rules expertly prepared in an ontology and transformed to more
Search WWH ::




Custom Search