Addressing the Opportunity (Video Search Engines)

Realizing that inexpensive storage, ubiquitous broadband Internet access, low cost digital cameras, and nimble video editing tools would result in a flood of unorganized video content, researchers have been developing video search technologies for a number of years. The recent trends in digital video creation and delivery technology have brought the need for such tools to the forefront. The computing technologies contributing to this flood are also available to the tool builders to help provide a lifeline to Web video viewers. Once-impractical media analysis technologies are being applied to large archives of video content to extract metadata to aid search. The social aspect (e.g. incorporating popularity of pages into rank calculations), initially overlooked and in-fact largely irrelevant due to lack of critical mass, provided a breakthrough for text search engine technology. For video media, the exploitation of user tagging and recommendation engines is similarly providing a much needed boost for video search.

While great advances in video search have been made and today’s video search engines provide a valuable service to users, the task of information extraction from video for retrieval applications is challenging; providing opportunities for innovation. This topic aims to first describe the current state of video search engine technology and second inform those with the requisite technical skills of the opportunities to contribute to the development of this field.


Today’s Web search engines have greatly improved the accessibility and therefore the value of the Web. The top portals prominently feature search capabilities and most have gone beyond text search to include image search and even video search, though the latter on a limited basis. A number of smaller companies have begun to offer more sophisticated media search features. Academic research groups have been actively developing algorithms and prototypes in this area for over a decade; incorporating and advancing previously existing constituent technologies.

Technology evolution has set the stage for rapid growth of video search engines: research and prototyping has been underway for several years, broadband access is ubiquitous, streaming media protocols and encoding standards are mature. Disk and processor cost reductions are making it possible to store and index large volumes of digital media and create indexed on-line archives. Market forces such as the emergence of IPTV and mobile video services and the growing acceptance of digital rights management technologies are fueling these trends.

Most media search systems rely on available metadata or contextual information in text form. Also, surrounding text or anchor text from links to the media are used to infer something about its content and, in some cases RSS feed descriptors point to media and include descriptive metadata. While these information sources are valuable and should be exploited, they are limited because they are typically brief, high level and subjective.

Therefore the current focus of media indexing research is to develop algorithms to exploit the media content itself as much as possible to augment available metadata. In some cases, the media may contain associated text streams such as closed caption or song lyrics. By extracting and operating on these streams, a textual representation of the dialog is obtained and existing text information retrieval methods can then be applied to retrieve relevant media. Speech recognition can be employed to create an approximation of the transcription, and techniques such as video optical character recognition can also be used to generate a textual representation of the media content. Although these technologies are inherently error prone, they have been used with success for indexing applications. Advanced speech retrieval systems use phonetic search to deal with the “out of vocabulary” problem and maintain alternative hypotheses in the form of lattices to boost recall.

Media retrieval that goes beyond the textual media component is more complex because the basic media features are not well defined and may not scale well for large archives. Further, formulating queries may not be as simple as typing a keyword. However systems have been designed to, for example, retrieve images similar to a given image (query by example) or retrieve images based on a specification of color or shape. For navigating video retrieval results, techniques such as video skimming or mosaicing have been proposed.

This topic takes a practical approach with the goal of bringing researchers up to date on the state of the art in multimedia search technologies and systems. Part of the presentation will follow a logical flow from content acquisition, analysis to extract index data, data representation, media archival, retrieval and finally rendering results in a Web-based environment. Each of these major functional components will be outlined, and particular emphasis will be given to automated content analysis techniques since this is critical for operating video search engines at scale, and it presents on-going research challenges. To give the readers an understanding of the issues involved, individual media processing algorithms operating on text, audio and video will be addressed including: text alignment, case restoration, entity extraction, speech recognition, speaker segmentation, and video shot boundary detection. Additionally, the value of operating on multiple media components simultaneously will be illustrated by examining multimodal segmentation techniques. The role of media segmentation in improving relevance ranking for long-form content will be discussed.

In addition to media processing, index representation issues using XML and media archival systems will be presented. The relation between indexing, summarization and media adaptation for mobile devices will be discussed. Challenges encountered when building Web-based user interfaces for browsing indexed streaming media will be addressed.

In parallel with the functional discussion, a historical perspective will be provided, and relevant work will be cited from both academic and industrial sources. Background information such as digital media encoding and streaming standards and information retrieval will be given to allow the topic to stand on its own.

Application areas vary widely, and the applicability of media search techniques is limited to certain domains. For example, video from Web cams is quite different from broadcast television content. The topic will make this clear, pointing out techniques that are suitable for different levels of structure or different quality levels of the source material.

Practical issues will be brought to light through presentation of detailed case studies including a system supporting rapid content queries on a 50,000 hour video archive spanning 10 years of broadcast television and Internet video.

Next post:

Previous post: