In Depth Tutorials and Information

Retrieval (Video Search Engine Systems)

The Extensible Markup Language (XML) is well suited for representing the extracted metadata, along with any metadata that accompanies the source asset and any additional metadata that is added over the life of the asset while in the repository. While many metadata systems can be used for high-level metadata, there is really only a single standard intended for representing media features for content based indexing applications, MPEG-7. For audio indexing, [Kim05] describes representing spoken content descriptors in MPEG-7 as well as low-level audio features and their use in classification and similarity metrics.

For efficient handling by the operating system and streaming media delivery systems, several files may be used for each asset – one for the encoded media representation for distribution, a JPEG thumbnail file for browsing and a metadata file in XML format. Of course, other optimizations are possible: for example, a system can be designed to minimize the number of individual files by embedding metadata within the stream. The thumbnail image can even be embedded as a digital item in the stream or key images can be extracted from the video dynamically. The commonly encountered trade off between compute and storage resource applies here as well, and schemes that create and cache temporary files as needed may be efficient solutions. For example, consider a video sharing site that uses Flash as the primary distribution format, but also supports downloading MPEG-4 versions for use with portable media players. The service can transcode all content for rapid response at the expense of storage, or transcode on demand in response to users requests. In the latter case, for popular videos where many user transcode requests are received, transcoding need only be performed once and a cached version of the file will service subsequent requests in order to reduce system compute load.

The XML metadata representation is ideal for transferring information between systems, and for archival storage where additional metadata may be added over the life cycle of the asset. However, for performance reasons, many systems perform a translation from XML into a traditional DBMS (database management systems) approach, at least for high-level metadata. Generating index structures for more fine grained rich media metadata is a research topic in itself and efficient solutions can be achieved for certain cases. Traditional database systems are optimized to respond to queries on multiple fields and return exact matches. A match is well defined, and may be extended to include a range or to support some invariance such as ignoring text case. These systems can be successfully used for tag-based multimedia retrieval, but for content-based retrieval we must extend this capability to support similarity search. Note that we desire semantic similarity which may be subjective and at any rate, can only be approximated algorithmically today. Many multimedia DBMS store features as blobs (binary large objects), perhaps with an application specific similarity metric defined on them. The general problem of constructing indices for rapid retrieval based on high dimensional features is a fertile area of research. For example, [Sant02] explores schema design based on feature substructure to facilitate k-NN and range searches, and Lu [Lu98] discusses performance metrics for multimedia database systems and shows that commonly used metrics may sometimes provide conflicting indications of system performance. For the interested reader, Lu [Lu99] covers many aspects of multimedia database systems design and [Sub98] includes an example of including a movie in a traditional database. Large scale video search implementations can tax even well-designed traditional database systems and lightweight efficient approaches designed to cope with scale [Greer99] can be effective solutions, particularly when extended with support for heterogeneous data represented in XML [Amer02]. It is important to bear in mind that video archiving with mainly textual queries is not a typical DBMS task; large swaths of the traditional database infrastructure, such as ensuring transactional consistency, are not required. For Web search, the act of deleting a record is so infrequent that it is almost not a requirement to implement.

Next post: User Perspectives (Video Search Engine Systems)

Previous post: Content Processing (Video Search Engine Systems)

Retrieval (Video Search Engine Systems)

Related Links

:: Search WWH ::