Database Reference
In-Depth Information
is based on a Na¨ve Bayes classifier whose model is built on feature vectors of
correlated Wikipedia articles. Finally, a voting system which takes care of disam-
biguation results obtained by previous techniques was also employed. Independent
evaluations carried out for each of the two tasks showed that both system compo-
nents produce accurate annotations. The best performance for the keyword extrac-
tion task is achieved by the Keyphraseness statistics with accuracy, recall, and
F-measure results of 53.37%, 55.90%, and 54.63%, respectively. The disambigua-
tion procedure reaches an accuracy of 94% at best.
2.6 Mining Community-Contributed Media
An overview of different works focused on mining huge collections of community-
contributed media is presented in the following. Section 2.6.1 describes different
approaches for the extraction of semantics from photo tags available on Flickr,
while Sect. 2.6.2 presents how Wikipedia articles can be used as a knowledge
base to achieve an automatic classification over electronic documents. Finally,
Sect. 2.6.3 describes two research efforts aimed at categorizing and automatically
organizing large sets of video clips.
2.6.1 Semantics Extraction from Photo Tags
Photo tags, in the form of unstructured knowledge without a priori semantics, can
be efficiently mined to automatically extract interesting and relevant semantics.
Many works have been devoted to these issues, which can be classified according to
the taxonomy shown in Fig. 2.3 .
A lot of research effort has been devoted to jointly analyzing Flickr tags with
photo location and time metadata. Approaches proposed in [ 36 , 39 ] analyze inter-
tag frequencies to discover relevant and recurrent tags within a given period of time
[ 39 ] or space [ 40 ]. However, semantics of specific tags were not discovered.
One step further toward the automatic extraction of semantics from Flickr tags
was based on analyzing temporal and spatial distributions of each tag's usage [ 41 ].
The proposed approach extracts place and event semantics by analyzing the usage
in the space and time dimensions of the user-contributed tags assigned to photos on
Flickr. Based on temporal and spatial tag usage distributions, a scale-structure
identification (SSI) approach is employed, which clusters usage distributions at
multiple scales and measures the degree of similarity to a single cluster at each
scale. Tags can ultimately be identified as places and/or events . The proposed
technique is based on the intuition that an event refers to a specific segment of
time, while a place refers to a specific location. Hence, relevant patterns for event
and place tags “burst” in specific segments of time and regions in space, respec-
tively. In particular, the number of usage occurrences for an event tag should be
much higher in a small segment of time than the number of usage occurrences of
Search WWH ::




Custom Search