Community-Contributed Media Collections: Knowledge at Our Fingertips - Community-Built Databases: Research and Development

Database Reference

In-Depth Information

Kennedy and Naaman [ 53 ] propose the novel application of existing audio

fingerprinting techniques to the problem of synchronizing video clips taken at the

same event, particularly concert events. Synchronization allows the generation of

important metadata about the clips and the event itself and thus enhances the user

browsing and watching experience.

A set of video clips crawled from the Web and related to the same event is

assumed to be initially available. Fingerprints are generated for each of them by

spectral analysis of the audio tracks. The results of this process are then compared

for any two clips to identify matches. Both the fingerprinting techniques and the

matching algorithm are quite robust against noisy sources - as is often the case with

user-contributed media. Audio fingerprinting matches are exploited to build an

undirected graph, where each node represents a single clip and edges indicate

temporal overlapping between pairs of clips. Such a graph typically includes a

few connected components, or clusters, each one corresponding to a different

portion of the captured event. Based on the clip overlap graph, information about

the level of interest of each cluster is extracted and highly interesting segments of

the event are identified. In addition, cluster analysis is employed to aid the selection

of the highest quality audio tracks.

Textual information provided by the users and associated with the video clips is

also mined by a tf-idf strategy to gather descriptive tags for each cluster so as to improve

the accuracy of search tasks, as well as suggest metadata for unannotated video clips.

This system has been applied to a set of real user-contributed videos from three

music concerts. A set of initial experiments enabled the fine-tuning of the audio

fingerprinting and matching algorithms to achieve the best matching precision.

Manual inspection showed that a large fraction of the clips left out by the system

were very short or of abnormally low quality, and thus intrinsically uninteresting.

Proficiency in identifying important concert segments (typically hit songs) has then

been assessed by comparison with rankings found on the music-sharing Web site

Last.fm, with a positive outcome. A study with human subjects has also been con-

ducted which was able to validate the system's selection of high-quality audio.

Finally, the approach proposed to extract textual descriptive information proved

successful in many cases, with failure cases being mostly related to poorly anno-

tated clips and small clusters.

2.7 Perspective

This chapter presented and discussed different community-contributed media col-

lections by highlighting the main challenging research issues opened by the online

social networking sites.

In the last few years, an increasing research effort has been devoted to studying

and understanding the growth trend in social networks and the media content

distribution. Furthermore, the abundance of multimodal and user-generated media

has opened novel research perspectives and thus introduced novel challenges to

mine large collections of multimedia data and effectively extract relevant knowl-

edge. Much research has been focused on improving the automatic understanding

Community-Built Databases: Research and Development

Search WWH ::

Custom Search

Home