Database Reference
In-Depth Information
Kennedy and Naaman [ 53 ] propose the novel application of existing audio
fingerprinting techniques to the problem of synchronizing video clips taken at the
same event, particularly concert events. Synchronization allows the generation of
important metadata about the clips and the event itself and thus enhances the user
browsing and watching experience.
A set of video clips crawled from the Web and related to the same event is
assumed to be initially available. Fingerprints are generated for each of them by
spectral analysis of the audio tracks. The results of this process are then compared
for any two clips to identify matches. Both the fingerprinting techniques and the
matching algorithm are quite robust against noisy sources - as is often the case with
user-contributed media. Audio fingerprinting matches are exploited to build an
undirected graph, where each node represents a single clip and edges indicate
temporal overlapping between pairs of clips. Such a graph typically includes a
few connected components, or clusters, each one corresponding to a different
portion of the captured event. Based on the clip overlap graph, information about
the level of interest of each cluster is extracted and highly interesting segments of
the event are identified. In addition, cluster analysis is employed to aid the selection
of the highest quality audio tracks.
Textual information provided by the users and associated with the video clips is
also mined by a tf-idf strategy to gather descriptive tags for each cluster so as to improve
the accuracy of search tasks, as well as suggest metadata for unannotated video clips.
This system has been applied to a set of real user-contributed videos from three
music concerts. A set of initial experiments enabled the fine-tuning of the audio
fingerprinting and matching algorithms to achieve the best matching precision.
Manual inspection showed that a large fraction of the clips left out by the system
were very short or of abnormally low quality, and thus intrinsically uninteresting.
Proficiency in identifying important concert segments (typically hit songs) has then
been assessed by comparison with rankings found on the music-sharing Web site
Last.fm, with a positive outcome. A study with human subjects has also been con-
ducted which was able to validate the system's selection of high-quality audio.
Finally, the approach proposed to extract textual descriptive information proved
successful in many cases, with failure cases being mostly related to poorly anno-
tated clips and small clusters.
2.7 Perspective
This chapter presented and discussed different community-contributed media col-
lections by highlighting the main challenging research issues opened by the online
social networking sites.
In the last few years, an increasing research effort has been devoted to studying
and understanding the growth trend in social networks and the media content
distribution. Furthermore, the abundance of multimodal and user-generated media
has opened novel research perspectives and thus introduced novel challenges to
mine large collections of multimedia data and effectively extract relevant knowl-
edge. Much research has been focused on improving the automatic understanding
Search WWH ::




Custom Search