Content Processing (Video Search Engine Systems)

Newly acquired media is available in a state that facilitates subsequent operations on the media such as transcoding and metadata extraction. Transcoding engines may operate independently on the content with the goal of preparing the content for delivery (perhaps via streaming) and normalizing to a single format suitable for archiving. We can consider a path for the media separate from the path for the metadata. The media will be positioned on user-facing media servers or origin servers for content distribution. Metadata will head to the index and storage for use in browsing (Fig. 4.4) and be tied together using content identifiers.

Metadata augmentation via automated content processing.

Fig. 4.4. Metadata augmentation via automated content processing.

Asset Management

We’ve presented a one-directional data flow model for content ingest and processing, but we did allude to other alternatives. Certainly, social tagging is architected such that the asset is not simply posted into the database and then read out for consumers. The asset metadata effectively continues to grow via repeated annotation by the end users. In fact, the viewers become authors in a sense; they alter the content and add value for subsequent viewers. So there is a feedback loop for additional metadata annotation. An interesting example of this phenomenon is dotsub.com where users translate Web media and create subtitles that are made available via a flash player. One may even consider the act of viewing an asset as a source of additional metadata about the asset. Systems can log the number of views, fast forwards, etc. for each asset. Similarly, we may consider systems where the annotators are not casual Web users seeking entertainment, but rather professional annotators logging content, perhaps with specific business purposes in mind. The architecture is somewhat similar, in which an asset is entered directly into the database, and the metadata is added later. For these systems, content management systems (CMS) architectures may be employed where a bus connects various Web service enabled components and workflows are defined for a range of applications (Fig. 4.5.) The terms Digital Asset Management (DAM) or Media Asset Management (MAM) are also used to indicate a more specific type of CMS. Additional automated post processing operations such as importing transcripts or subtitles which may not be available at the time of ingest can be implemented as independent execution threads – reading, processing to create additional metadata, and rewriting to update the asset record. This decoupled processing is used for cases where real-time processing is not practical given existing hardware resources, and where content is not arriving at a continuous rate.


Services oriented architecture for content processing.

Fig. 4.5. Services oriented architecture for content processing.

The range of options for metadata extraction operations on media is vast and still growing as new media analysis algorithms are developed. These will be discussed in detail in later topics, but will typically involve demultiplexing streams, decoding and performing computationally intensive operations. Also, with any system that handles video, data bandwidth is a concern and careful system design is required to minimize data access and transport throughout the system. Unfortunately, it is challenging to design a flexible system that can operate at scale to support comprehensive video search. As a result, many search engines today deal solely with high-level metadata, and the extent of their media processing is limited to transcoding and representative image selection.

Next post:

Previous post: