Media Syndication: RSS Content Description (Video Data Sources and Applications)

Content Syndication

RSS (really simple syndication) descriptors were developed for syndication of HTML news feeds but have evolved into a de facto metadata standard for Web media. RSS allows content producers to express high-level metadata about their content in a standardized way to enable other sites, aggregators, to display content from many different providers through a uniform user interface. Since Web revenue models depend on ad impressions, it may seem counterintuitive that a content provider would wish to syndicate their content, which would allow the aggregators to become Web destinations and therefore eat into the ad revenue stream. However, the primary purpose of RSS is to announce content and to entice users to visit the origin site. The content descriptors always include a URL (rendered as a clickable link by aggregators) to allow users to retrieve the content. Importantly, the content itself is not displayed on the aggregator’s site where the aggregator would benefit from ad impressions, perhaps in a peripheral frame around the content display. Rather, the user is directed back to the content origin site, and this site is designed to promote other content from the same site, therefore increasing dwell time and ad impressions. Therefore, it can be seen that it is in the best interest of content providers to publish these RSS format content descriptions and to promote these to aggregators to maximize viewership. RSS is designed for recurring content or series and is organized as “channels” of “items.” Note that the RSS “channel” is similar to a television series or program, not a television “channel.” RSS “items” are similar to television “episodes.” The New York Times does not have it’s own single RSS representation, but rather has several RSS feeds, such as “technology news,” “sports,” “world news,” etc.


Media Enclosures

Another important aspect of RSS is that in addition to the dynamic online mode of consumption described above, RSS supports a download model. A user can subscribe to an RSS feed in a reader application, Web browsers or e-mail client. The reader will download new content automatically as it becomes available to local storage and manage this content store, deleting older content as desired. This enables offline content consumption, and is well suited for mobile applications where connections may not always be available.

For connected applications, RSS provides an alternative to streaming. Content is downloaded in the background, and then played out from local storage. This enables “trickle” content distribution where the connection bandwidth is less than the media bit rate. The RSS reader effectively manages an edge cache for the user, providing instant access to high quality content, unaffected by any networking impairments due to load, packet loss, etc. Today’s connected DVRs (e.g., Tivo®) and even displays (e.g. Sony Bravia® Internet Video Link®) contain feed readers and local storage to move RSS beyond the desktop to the set-top. This mechanism can offer an efficient alternative to broadcast distribution of serial television content, particularly niche content, reserving the high performance IPTV networks for live content such as sports. Basically, any content that a user watches from a DVR can be delivered via RSS or other managed download at higher quality and at much lower network engineering cost (no real-time quality of service guarantees are required.) The only downside is delay -the user must identify in advance, to which content they are interested in subscribing. Although the typical RSS feed uses HTTP over TCP to transport the media, it is possible to use peer-to-peer (P2P) content distribution, in which case the origin URL refers to a torrent seed, for example.

The RSS 2.0 XML syntax [Win03] is easy for developers and content creators to use and the typical high-level metadata of title, date, description are readily available (see Fig. 2.10 for example.) In addition to the media, RSS includes the specification of a channel icon to represent the content in user interfaces. The XML namespace mechanism allows RSS content descriptions to support additional applications and metadata such as geospa-tial coordinates such as GeoRSS and W3C Geo or traditional bibliographic metadata such as the Dublin Core (see Table 2.5.) Unfortunately this extensibility has led to some added complexity, incompatibilities and redundancy in the metadata specifications in use. The Atom format was proposed as an improvement and partial solution to address these incompatibilities, but until such time as RSS sunsets, the result is yet another syndication format on the landscape.

An RSS 2.0 sample with MPEG-4 enclosures intended for the Sony PlayStation Portable (Rocketboom).

Fig. 2.10. An RSS 2.0 sample with MPEG-4 enclosures intended for the Sony PlayStation Portable (Rocketboom).

Table 2.5. A stack view of RSS protocols.

iTunes

MediaRSS

DCTerms

etc.

RSS 2.0

XML 1.0

Encoding (e.g. UTF-8)

Podcasts

RSS and Atom include enclosure tags to refer to other media besides text, and Apple has chosen RSS 2.0 for their Podcast format (Fig. 2.11.) The widespread success of iTunes and the iPod® personal media player has resulted in the unprecedented deployment of easy to use media download management capabilities. While other systems for organizing personal media and other personal media players predated Apple’s iTunes®, the single vendor environment has enabled a reliable, consistent platform for content delivery to mobile devices via download.The standard is open and has been implemented by many consumer electronics device manufactures. For example, the Sony PSP® includes a WiFi interface and supports automated unattended syncing eliminating the requirement that users dock and sync their devices to get new content.

 A segment showing metadata from an RSS 2.0 Podcast.

Fig. 2.11. A segment showing metadata from an RSS 2.0 Podcast.

RSS for Content Ingest

Podcast content is typically free of DRM and uses open standards such as HTTP, XML and MP3 – these attributes, combined with the metadata and scheduled update support provide for near-ideal conditions for media search engines to ingest the content. The RSS descriptions offer great efficiencies for spiders over traditional crawling. While engines may not have the rights to store and redistribute the media streams, it is widely accepted for search engines to provide indices and direct users back to the origin. RSS feeds can point to a large collection of archived serial content, so search engines can quickly ingest and create indexed archives going back into the past in a controlled manner. The feed organizes the collection of media files into a cohesive unit with common metadata for the series. A “crawler” (actually a “feed reader”) can download the small XML description and determine if there is any new content to download and ingest for indexing – there is no need to hunt around a directory tree searching for new media files. Also since RSS items include the publication time, search engines can make informed estimates of the next time that content will be published and only check for new content at that time. (RSS includes a “time to live” parameter indicating the maximum cache time, but it is generally more reliable to predict the next content publication time based on the frequency of past publications using heuristics.)

Table 2.6. Supported audio and video enclosure types for common RSS feeds.

Feed

Support enclosure types (and file extensions)

Audio Podcast

audio/mpeg (.mp3), audio/x-m4a (.m4a)

Video Podcast

video/mp4 (.mp4), video/x-m4v (.m4v), video/quicktime (.mov)

MediaRSS

any

MediaRSS

As Table 2.6 indicates, there are many formats that cannot be included in standard Podcasts. Also, it is common practice for sites to offer content in multiple formats and multiple bitrates. Yahoo’s MediaRSS addresses some of these shortcomings; in particular, multiple enclosures are supported to offer different representations (different formats, bitrates) of the content. Yahoo’s video search engine suggests that content providers use MediaRSS to publish their media for ingestion by the search engine. The MediaRSS specification goes beyond global metadata to include elements with media timestamps. This capability allows for multiple thumbnail images and text that includes a temporal component to support captions.

The simplicity combined with extensibility of RSS has resulted in widespread adoption of this format on the Internet. They are used for amateur (blogs) and professional content (e.g. TV news clips or radio programs).

Next post:

Previous post: