Content Representation Standards (Interactive 3D Content Standards) Part 1

Abstract Development of 3D interactive network applications requires standards for representing 3D content as well as metadata standards for describing the content. In this chapter, selected standards for content representation and content description are presented. In particular, the VRML, X3D, and MPEG-4 ISO/IEC standards are presented. Other standards such as U3D, COLLADA, and 3D XML are also discussed and compared. Metadata standards suitable for describing interactive 3D content are also presented.

A number of standards have been developed for platform-independent representation of 3D/VR content permitting exchange of content between applications and its distribution through the network. Content representation standards differ in their capabilities of describing content features and encoding methods, making them more suitable either for exchange or for publishing of content. In this section, selected standards suitable for content publishing are presented.

The most versatile content publishing standards are VRML [27, 28], X3D [37], and MPEG-4 [29, 41], all approved by ISO/IEC. These standards enable publishing or broadcasting synthetic interactive multimedia 3D content and accessing the content on various platforms in different architectural setups. The U3D ECMA standard [51] enables embedding and presentation of 3D models within PDF documents. Content exchange formats, which also enable content visualization in Web browsers, include COLLADA [5, 14] and 3D XML [17]. All these standards are further described in the following sections.

Virtual Reality Modeling Language

The Virtual Reality Modeling Language (VRML) is a textual file format for representing and publishing interactive 3D multimedia content [4, 11, 27, 28]. VRML is capable of representing both static and animated 3D and multimedia objects with hyperlinks to other media such as text, sounds, movies, and images.

VRML scenes displayed in a Web browser

Fig. 2.1 VRML scenes displayed in a Web browser

VRML browsers, as well as authoring tools enabling creation of VRML files, are available for different platforms. Currently, the most popular browsers include Par-allelGraphics Cortona3D [15], Bitmanagement BS Contact [9], open-source Xj3D [65], and instantreality [24]. Figure 2.1 shows two examples of 3D VRML models displayed in a Web browser equipped with a VRML browser plug-in (3D models courtesy of the National Museum of Agriculture and Agricultural Food Industry in Szreniawa [56]).

VRML has been developed by the Web3D Consortium [60]. The first release of the VRML specification (VRML 1.0) [7] was created in 1994 based on the Open Inventor data exchange format [62, 63]. That version of the specification allowed creation of static only virtual scenes. The second release—VRML 2.0—added support for animation, interaction, and scripting. In December 1997, VRML 2.0 with small corrections was formally released as an International Standard ISO/IEC 14772:1997 [27]. This version is commonly known as VRML97. The specification consists of two parts: Part 1 (ISO/IEC 14772-1) defines the base functionality and text encoding (UTF-8) for VRML. Part 2 (ISO/IEC 14772-2) defines the base functionality and language bindings for the VRML External Authoring Interface (EAI). In 2003, Amendment 1 to the specification (ISO/IEC14772-1:1997/Amd. 1:2003) was formally approved [28]. The amendment adds modifications to allow better interoperability among VRML implementations as well as support for geographical objects (GeoVRML) and NURBS (Non-Uniform Rational B-Spline) nodes. Currently, VRML has been officially superseded by X3D (Sect. 2.1.2), however, the original VRML97 specification is still widely used among developers.

VRML has been designed for use on the Internet, intranets, and local client systems. It was intended to be a universal publishing and interchange format for 3D graphics and multimedia. VRML is in some sense analogous to HTML—it is a multi-platform language for publishing 3D content on the World Wide Web. It supports also the notion of hyperlinks. Entities in a VRML virtual scene can be connected via hyperlinks to other scenes and other media such as text, sounds, movies, and images.

VRML describes multimedia content in an abstract way, without defining any physical devices or other implementation-dependent concepts (e.g., screen resolution or input devices). Each VRML file describes a single virtual scene. The scene may be the whole “virtual world,” a part of it, a single virtual object, or a part of a virtual object. One virtual scene may play different roles in different contexts. Each VRML file establishes a coordinate space for all objects defined and included in this file, defines and arranges in this space a set of 3D and multimedia objects, and can specify hyperlinks to other VRML or non-VRML Web resources.

VRML files describe contents of 3D scenes and 3D objects using a hierarchical structure called a scene graph. Elements of the scene graph are called nodes. VRML defines a multitude of different types of nodes. These types include geometry primitives such as box, sphere, cone, indexed-face-set, and text, appearance properties such as material, image texture, and movie texture, sounds and sound properties, and various types of grouping nodes such as group, transformation, inline, and level-of-detail.

Properties of nodes are defined in their fields. VRML specification provides a list of permitted field types including both single- and multi-value types. Basic field types include integers, floats, Boolean values, strings, time stamps, color values, images, different types of vectors, and nodes.

VRML uses a dataflow based event-passing mechanism for communication between nodes in the scene graph. Each type of nodes defines names and types of events that instances of this type may generate or receive. Special ROUTE statements are used to create paths for events between event-generators and event-receivers. Nodes can send events upon event reception thus leading to event cascades.

There is a special group of nodes in VRML called sensors. They are the basic mechanism allowing users’ interaction and animation in virtual scenes. There are several different types of sensors in VRML. These include a time sensor, a proximity sensor, a visibility sensor, and a variety of pointing device sensors—anchor, cylinder sensor, plane sensor, sphere sensor, and touch sensor. All sensors, except the time sensor, generate events in response to some user actions. They can be connected via ROUTE statements to other nodes in the scene to implement interactivity. The time sensor is a special node that generates events as the time passes. It is utilized in all kinds of animations in virtual scenes.

VRML provides a possibility of programming scene behaviors by the use of a special Script node. Script nodes can be inserted between event generators and event receivers. A script is a program executed every time an input event is received. The program can generate output events during its execution.

Smooth animations in VRML are achieved by the use of special interpolator nodes. Interpolator nodes behave like scripts but are built-in in each VRML-compliant browser. Interpolators perform simple animation calculations and are usually combined in a scene with a time sensor and some other nodes to implement movement in the scene.

List. 2.1 Example of a simple VRML scene

Example of a simple VRML scene

An example of a VRML scene code is presented in List. 2.1, while the scene rendered in a VRML browser is presented in Fig. 2.2.

Extensible 3D

X3D—Extensible 3D—is the successor to VRML (Sect. 2.1.1) also developed by the Web3D Consortium [60]. X3D has been designed to keep backward compatibility with VRML97 while providing more advanced functionality, new encoding formats, componentization, and extensibility.

X3D provides some functional extensions such as Humanoid Animation (H-Anim), Distributed Interactive Simulation (DIS), CAD geometry, programmable shaders, scene layering, rigid body physics, and particle systems. X3D also provides new formats for encoding virtual scenes. In addition to the VRML97 encoding, XML encoding and binary encoding are allowed in X3D. The standard also offers enhanced application programming interfaces (APIs).

VRML virtual scene from List. 2.1 rendered in a Web browser with BS Contact plug-in

Fig. 2.2 VRML virtual scene from List. 2.1 rendered in a Web browser with BS Contact plug-in

The X3D specification is modular. It defines a set of componentized elements that can be tailored for use in various applications or on various platforms. Modular specification simplifies creation of browsers and authoring tools for X3D.

The modular structure of the X3D standard allows definition of profiles offering different levels of functionality for different purposes. Specification of a profile consists of a list of required X3D components and their support levels. Different X3D systems can conform to different X3D profiles depending on their particular architectural or application requirements.

The main X3D profiles are:

•    Core profile—defining the absolute minimal file support required by X3D.

•    Interchange profile—the basic profile for exchanging X3D content—geometry and animations—between applications. It supports geometry, texturing, basic lighting, and animation. There is no runtime model for rendering and interaction, making it easy to use and integrate into any application.

•    Interactive profile—enabling basic interaction with a 3D environment by adding various sensor nodes for user navigation and interaction (e.g., PlanseSensor, TouchSensor, etc.), enhanced timing, and additional lighting (Spotlight, Point-Light).

•    Immersive profile—enabling implementation of immersive virtual worlds with complete 3D graphics and interaction support, including audio, collision, fog, and scripting. The immersive profile corresponds to VRML97 base profile, but implemented in the X3D architectural framework.

Relationships between main X3D profiles

Fig. 2.3 Relationships between main X3D profiles

•    Full profile—enabling the use of all nodes defined in X3D, including NURBS, H-Anim and Geospatial components.

Additional profiles defined in the X3D standard are:

•    MPEG-4 Interactive profile—providing basic interoperability with the MPEG-4 standard (Sect. 2.1.3) targeting broadcast, handheld devices, and mobile phones.

•    CADInterchange profile—enabling translation of CAD data for use by downstream applications, while appropriately supporting geometry and appearance capabilities of CAD systems.

Relationships between the main X3D profiles are graphically presented in Fig. 2.3. In Table 2.1 below, the ISO standardization status of VRML and X3D is presented [64].

In List. 2.2, X3D XML encoding of the virtual scene from List. 2.1 is presented. Figure 2.4 shows the X3D scene displayed in the instantreality browser.


MPEG-4 is an ISO/IEC standard developed by the Moving Picture Experts Group (MPEG) [55], a working group of the subcommittee SC29 Coding of audio, picture, multimedia and hypermedia information of the Joint Technical Committee 1 of the International Organizationfor Standardization (ISO) [26] and the International Electrotechnical Commission (IEC) [25]. The MPEG group has also developed the MPEG-1 and MPEG-2 [12] as well as MPEG-7 [36,42] (Sect. 2.2.3) and MPEG-21 [8, 38] standards.

MPEG-4 offers a comprehensive set of tools for delivery of different kinds of multimedia content [47]. The MPEG-4 standard provides standardized methods of [41]:

Table 2.1 ISO standardization status of VRML/X3D (as of July 2011)

ISO/IEC name

Common name



ISO/IEC PDAM1 19775-1:2008

X3D Architecture and base components V3 (Change Document)


July 2011



X3D Architecture and base components Edition 2


July 2008



X3D Scene access interface Edition 2


Jan 2011



X3D encodings: XML encoding Edition 2


Oct 2009



X3D encodings: Classic VRML encoding Edition 2


Oct 2008



X3D encodings: Compressed binary encoding Edition 1


Sep 2007

ISO/IEC FDIS 19776-3.2:2011

X3D encodings: Compressed binary encoding Edition 2


Jan 2011



X3D language bindings: ECMAScript


May 2006



X3D language bindings: Java


May 2006



Humanoid Animation


June 2006



Virtual Reality Modeling Language (VRML97)


Dec 2003

ISO/IEC 14772-1:1997/Amd. 1:2002

VRML97 Amendment 1


Dec 2003

•    representing units of aural, visual or audiovisual content, called media objects,

•    composing these media objects into audiovisual scenes,

•    multiplexing and synchronizing data associated with media objects,

•    interaction with the audiovisual scene generated at the receiver side.

Examples of MPEG-4 application types, include [57, 59]:

•    digital television (broadcasted and IP-based),

•    mobile communication, entertainment and portable gaming,

•    packaged media distribution,

•    video conferencing,

•    home networking, video recorders and cameras, surveillance systems,

•    satellite radio.

Next post:

Previous post: