In Depth Tutorials and Information

Factors Concerning Scalability (Video Search Engine Systems)

Introduction

In the process of designing a content indexing and retrieval system, several factors influence the scalability of the system and basic choices can have a great effect on the cost required to support a given user base. Of course, the fundamental design decision of referring video playback requests back to the originator as opposed to keeping a local transcoded copy of the video results in an entirely different class of service with commensurate costs of operation. The following list of scalability factors assumes a contributed model with local storage, but most of the basic factors relate to a wide range of video search applications:

Acquisition

• Content arrival rate / variability: On average, how many assets are posted to the system for indexing for a given time interval? How does this vary over the course of a day or week? What is the peak arrival rate during busy hours?

• Content average duration: The duration of the content (posted video clips) effects the processing time.

• Aggregate incoming content bit rate: Low bit rate clips will reduce required incoming bandwidth, but the resulting lower quality will negatively effect content based indexing performance and transcoded video quality. Google video suggests using the highest quality available to typical consumers (5M MPEG-2, or 2M MPEG-4) [Goo07].

Processing

• Real-time factor: as mentioned above, the types of processing indexing operations can vary widely, from simple metadata ingest to sophisticated feature extraction and content analysis. Transcoding is also considered to be a media processing operation. Once a particular palette of processing operations is selected, there are tradeoffs within each in terms of accuracy vs. speed. A key performance metric is the real-time factor, or ratio of wall clock time to media time; to put it another way, how long does it take to process content of a given duration on a typical server?

• Latency constraints: The number of processing servers required is a function of not only real-time factor, content arrival rate and variance, but also the maximum allowable delay from content acquisition to publication. Continuous content acquisition and processing applications such as broadcast monitoring are characterized by a fixed system latency, but for user contributed or syndicated sources, the service can be configured to accommodate the average content arrival rate. Users who post content during busy periods will experience more delay in processing, and the perceived quality of service is a function of the maximum and typical processing delays that a user experiences. Note that for international services, there may be too few appreciable quiet periods for services to exploit to effectively “catch up” (reduce processing queues). Also, processing priority schemes can improve the global user experience, at the expense of a small number of users. For example, instead of a first come, first served (FIFO) policy, short duration content can be processed before longer content, or new content from a traditionally popular source can be given higher priority.

Storage

• Archival media bit rate and encoding parameters: Regarding storage, the single most important factor by far is the size of the bulk media which is governed by the transcoded bit rate, or if the original content is preserved, the policies regarding uploaded media such as allowable bit rate and media duration. Many systems effectively archive the distribution format, that is to say that the distribution format is created once and maintained as the primary source in the archive. In this case, we must take into account considerations such as bit overhead for support of random access (such as short GoPs), forward error correction or bitrate scalability.

• Alternative media representations: Are transcoded versions of each asset to be created to support a wide range of devices for playback? For the best user experinces across all devices today, multiple streams are required.

• Broweable representations: Key frames used to create visual interfaces for users are the next most significant storage cost. In the extreme, it is possible to omit these altogether, and perhaps maintain an icon image for each series of programs, but this greatly detracts from the overall user experience. Some options for typical solutions include:

- a single thumbnail for each asset;

- key frames sampled uniformly or based on the video content;

- visual summaries that include motion video.

For each of these options, the spatial resolution of the representative images is an important design tradeoff between storage and quality of user experience. A similar tradeoff exists regarding the number of retained key frames for each asset. Longer video summaries may help users identify relevant videos in query results and additional key frames will increase the precision with which users can visually position long form content for playback. Note that to save long-term storage space, it is possible to dynamically extract these representations from the bulk media as the visual interface is rendered, perhaps with caching but this additional complexity is not typically justified. However, visual browsing within a single asset, using a more capable media player to render visual navigation points is practical.

• Content description bit rate: In many existing systems such as RSS aggregation search engines, the number of bytes used in the XML representation of the metadata is negligible in comparison to the media since only global, high level tags are used. However, as more and more timed metadata is included such as topic titles and transcripts, the size of the description will grow in proportion to the length of the content, and we may speak of a content description bit rate. While textual descriptions compresses effectively using MPEG-7 BiM for example, other, lower level media features such as phonemes or lattices may not compress as readily. It is even possible for the content descriptions to exceed the size of the content (for example, if phonetic lattices are used for searching 8kHz telephone calls.)

• Index storage: Derived from the content descriptions whether the form of an inverted text index or other structures for efficient retrieval of binary features, the size of the index eventually becomes an issue as the scale of the content grows. The index is generally maintained in a combination of high performance storage and memory so controlling the index size is a key system performance driver.

Retrieval

• Peak simultaneous users: As in any Web application, the primary metric determining scalability of the retreival subsystem is the number of simultaneous users at peak usage times. We may further specify this as the number of users supported per server since replication with load balancing can be used.

• User activity duty cycle: The retrieval user interface can be crafted such that the user spends varying amounts of time performing the actions of query, browse, and viewing video (see Fig. 4.6. User activities during video search and contribution) for a given session, and these states consume different sets of system resources. For example, viewing video puts no load on the query engine, but taxes the media delivery subsystem. On the other hand, if appropriate context is provided for query results, replay requests for undesired video can be minimized.

• Output user interface bandwidth: Rendering rich media user interfaces heavy with still images or flash animations can result in large “pages” or a large amount of interactive content outbound from the service.

• Output media bandwidth: By far the largest overall consumer of outbound bandwidth is the replay of the media itself. At its core, video search is essentially a VoD application and optimizations for system resource utilization (caching of popular content based on Zipf distributions, etc.) have been well studied [Chang97, Yu06]. In addition to the codec selection and transcoded bitrate, other transcoding parameters such as VBR or selection of short GoP to improve random access will effect the video quality vs. output bandwidth.

• Rate of stream control requests: Streaming systems supporting rapid start may burst data at a higher rate than the average bit rate at start or after seek requests. If the service is designed such that users rapidly request many videos, as opposed to passively watching long-form video, the load on the server and the total output bandwidth will be increased.

Next post: Retrieval Interfaces (Video Search Engine Systems)

Previous post: User Perspectives (Video Search Engine Systems)