From 3D Geometry to Meaningful Interactive Virtual Environments (Issues in Creation, Management, Search and Presentation of Interactive 3D Content)

Creation of interactive 3D multimedia content is a complex task that requires significant expertise and effort. Since practical 3D applications need large amounts of meaningful high-quality content, the difficulty of content creation becomes one of the main problems limiting wider use of 3D applications on everyday basis. Important is to enable creation of content by users without advanced programming or 3D design skills. Only involvement of common users, such as domain specialists and end-users, can guarantee large amounts of relevant 3D content required for deployment of 3D applications in various domains. For example, a curator in a museum has knowledge on how to create a meaningful exhibition of cultural objects but typically has no experience in 3D design or programming.

Creation of interactive 3D content involves at least three steps: creation of 3D models and animations, assembling the models into virtual scenes, and programming the scenes behavior.

Creation of 3D models can be accomplished by scanning or modeling. 3D scanners—based in most cases on lasers or structured light—are becoming increasingly accurate, require less user interaction and are able to capture more advanced surface properties [46, 51, 83]. This enables creation of high-quality 3D models of existing objects or interiors. There is also a number of advanced 3D modeling packages such as Autodesk’s 3ds Max, Maya and Softimage, and open source Blender which can be used by content designers to refine or enhance the scanned models and to create imaginary objects. The same tools can be used to assemble 3D objects into complex objects and scenes, to design animations and to add complex objects such as models of humans and animals.


Programming behavior of virtual objects and scenes is usually the most challenging task in the process of creating interactive 3D content. In X3D/VRML/MPEG-4 standards, basic level of behavior programming is provided through scripting and dataflow event processing mechanisms. Scripting behavior can be realized by the use of Script node. A Script node may contain a program—a set of functions— which can process input events, generate output events, and store its state in local variables (fields). The languages used for programming behavior in scripts depend on particular browser implementation but typically ECMAScript [77] and Java [43] are supported.

The basic behavior programming mechanisms provided in VRML/X3D and MPEG-4 may be successfully used to program some simple specific tasks in a 3D scene (e.g., open door if some condition is satisfied), however, programming complex behaviors at this level is a burdensome task. Therefore, significant research and development effort has been invested in the design of languages, frameworks and tools for specifying behavior of virtual objects and scenes at a higher level of abstraction. Examples of high-level scripting languages describing behavior of human characters include the MPML-VR [58], VHML [52], APML [22], BML [84], and PML [44]. General-purpose behavior programming languages include VEML [17] and BDL [18]. An approach based on the concept of aspect oriented programming has been proposed in [54]. A method of programming interactive Web 3D applications based on combination of JavaScript and the Scene Authoring Interface (SAI), called Ajax3D, has been described in [68]. Examples of integrated design frameworks include Contigra [20] and Behavior3D [21]. Another solution, employing the use of distributed components accessible through Web Services, has been proposed in [93]. Examples of graphical behavior design tools include [7, 86], and [69, 70].

Although the common motivation for developing the above-described solutions is simplification of creation of behavior-rich content, these tools are still complex and require significant technical expertise. Even if behavior programming is performed by using high-level languages or high-level diagrams, the programs and diagrams become complicated when they are used to describe complex content. With such tools, content can be created occasionally by experienced programmers and designers, but not by domain experts or end-users on a daily basis.

Real simplification of the content creation process can be achieved in practice either through automation or configuration. Automation leads to decreasing or eliminating involvement of people in the content creation process. It can be applied to the process of scanning 3D objects or interiors or generation of 3D models based on some other types of data. However, creation of interactive behavior-rich content in a general case cannot be done automatically. Input is needed from domain experts or content users. Yet, this input should concentrate on what is to be created and not how this should be created.

To simplify the creation of interactive behavior-rich content to an extent that would allow this process to be efficiently performed by domain experts and end-users, we need to permit creation of content from previously prepared building blocks—components. Componentization enables also content reuse, which further reduces time and effort required for building a 3D/VR application. Clearly, there is a trade-off between the flexibility of content creation tools and their ease of use. Generally, the more an authoring environment allows a user to do, the more difficult it is to operate. A solution to this problem is in splitting the tasks performed by different categories of content designers. A content creator may build virtual scenes by assembling components taken from a predefined library. It is relatively easy to compose a scene, but the process is somehow constrained. However, additional functionality may be achieved at any time by adding new types of components to the library when required. This task can be performed by programmers or 3D designers.

Basic level of componentization is supported by the VRML/X3D/MPEG-4 standards through prototyping, which enables definition of new types of nodes based on existing types of nodes. However, this mechanism is meant strictly for programmers, and therefore it is not suitable for domain experts or end-users. Also, existing solutions enabling content creation based on purely 3D components are not sufficient for building 3D applications, in which behavior plays the primary role. In such applications, some components may contain 3D geometry, while others should implement sounds, animations, interaction elements, sensors, scenarios, schedulers, etc.

However, to enable composition of behavior-rich 3D content based on a library of behavior-rich components a shift of paradigm is required. Although content publication standards, such as X3D and MPEG-4, use hierarchical scene graphs to describe scene content, the dataflow graph which describes scene behavior is realized as a separate, generally non-hierarchical structure, orthogonal to the main scene graph describing the scene composition. Interweaving these two graph structures practically precludes possibility of dynamic scene composition based on the scene content hierarchy.

The proposed shift of paradigm is a consequence of three observations. First, a 3D/VR application is much more than just the presented 3D content, in the same way as, for example, a text editor is much more than a set of graphical widgets it is currently displaying. Moreover, the 3D/VR content (geometry, audio, and possibly also other modalities in near future) is just a way of communicating the state of the application to a user. 3D objects may appear, disappear, and may be replaced by other objects as the application state changes. Third, in the case of an interactive application, it is much more likely that the presented 3D content will change than that the way the application operates will change.

These observations are well known to programmers creating 3D applications based on scene graph APIs (such as Open Inventor [60], OpenSG [63], OpenScene-Graph [62] or Java3D [42]). In such applications, the scene graph represents the current state of the application 3D interface, while a separate application layer is responsible for manipulating the scene graph and handling events. With VRML/X3D, a similar functionality can be achieved using the EAI/SAI (External Authoring Interface/Scene Access Interface) interfaces respectively. Since in VRML/X3D standards, a program contained in a Script node can be arbitrarily complex, in an extreme case, a single Script node can create and manipulate the whole 3D scene. In such a case, the SAI acts as the scene graph API, while the script is equivalent to the application layer. This solution is powerful in that it enables flexible generation and manipulation of the scene, but it does not solve the problem of efficient content creation. In fact, preparing 3D content in such a way requires even higher expertise than the classical approach.

To enable flexible configuration of complex behavior-rich 3D content, specific organization of scene content is required. This issue is further discussed in Chap. 5, and a new approach, called Flex-VR, is proposed. In Flex-VR, content is organized following a novel structuralization model, called Beh-VR. In the Beh-VR model, content is composed of independent high-level objects, called VR-Beans, which can be freely assembled into 3D scenes. Moreover, a generic high-level Flex-VR content model is proposed which describes a 3D/VR application on a higher level of abstraction than a typical content representation standard (such as VRML/X3D) and enables efficient creation, management, reuse, and exchange of interactive behavior-rich 3D multimedia content.

The problem of preparation of interactive 3D content is specific in the case of augmented reality applications, where in addition to interactions between objects in the virtual environment and limited interactions between users and the virtual environment also interactions between the real environment and the virtual environment must be taken into consideration.

In AR applications, both virtual objects and representations of real objects should be taken into account including their visual and behavioral aspects. Virtual objects can be placed at different locations in relation to real objects present in an AR environment. Position and orientation of the virtual objects should be changing automatically according to the location of real objects that can be freely manipulated by users. Visualization and behavior of virtual objects and representations of real objects can be dynamically altered depending on their internal state, state and behavior of other objects in the scene, and time progress. Both virtual objects and representations of real objects should be interactive, i.e., they should react to events occurring in an AR environment. These reactions may involve changes in the visual and behavioral characteristics of these objects. Events in AR environments can occur as a result of user interaction, behavior of objects, changes in spatial relationships between objects, and time progress.

Existing solutions in the augmented reality domain do not address all of the requirements presented above. Most of the existing approaches in the augmented reality domain focus on the design of core aspects of augmented reality applications such as tracking, display, and human-computer interaction, e.g., Studier-stube, ARToolKit, DWARF. The Studierstube [73] framework provides an abstraction layer for low-level AR technologies. Development of new applications requires writing subclasses inheriting from the Studierstube framework classes in C++. Thus, development of applications based on the Studierstube platform can be performed by programmers with low-level knowledge of 3D graphics programming. The ARToolKit [8, 45] library is merely aimed at optical tracking of special square markers in a real environment. By default, the library enables displaying non-interactive virtual objects registered with the tracked markers. The DWARF [9] framework is aimed at development of wearable AR applications. DWARF provides services offering core AR functionalities that can be run in a distributed architecture.

Few existing solutions deal with the design and development of AR applications without low-level programming. Examples of such approaches are APRIL [48] and SSIML/AR [87]. These approaches adopt UML [57] for describing presentation flows. Thus, they are mainly aimed at software technologists able to successfully use UML. Based on the UML diagrams appropriate descriptions of applications are generated. However, they do not permit to build reusable adaptive components encapsulating semantics, geometry, and behavior. APRIL provides a central storyboard, where users can specify basic behavior of presentation components. The components do not have their own behavior, so each presentation must define the whole behavior of all its components. Also, user interaction with the presentation is limited by detection of solely button-pressed or object-touched events. SSIML/AR proposes a task flow model, which specifies behavior of an AR presentation as a whole, treating its components as passive structures. Also, the system does not evaluate advanced spatial relationships between real and virtual objects.

Other examples of approaches aimed at creating AR presentations are InstantRe-ality, AMIRE, DART, i4D. The InstantReality [40] framework offers an extension to the X3D scene and execution model. The system allows creating AR applications in the form of dataflow graphs by modeling rather than programming. The AR applications created with InstantReality are composed of components and relations between these components. Each component consists of parameters and a processing unit responsible for controlling the component behavior. However, creating advanced AR presentations containing objects with complex interactive behavior requires involvement of highly skilled X3D designers. The AMIRE [28] project provides a model for authoring AR applications with a focus on reusability of low-level services needed for building AR presentations rather than high-level content. The DART [50] system enables authoring AR applications using a multimedia authoring tool with some extensions. The system enables creating presentations by placing different media objects along the timeline and specifying their behavior. Behavior is not an integral part of the objects. Behavior of the presentations is described in the Lingo scripting language, which requires sophisticated programming skills. The i4D [33] framework enables creation of 3D presentations composed of animated 3D actors. The system is mainly focused on creating animated 3D presentations composed of actors encapsulating their internal behavior described by actions and reactions. The system enables display of such presentations in an AR environment but it does not provide users with interaction capabilities with the presented content.

A notable example of a system for creating AR applications for mobile devices is Qualcomm AR (QCAR) [71]. The QCAR system permits one to create mobile AR applications for Android smartphones [5]. The QCAR system is a vision-based augmented reality platform, which uses complex computer vision algorithms for alignment of 3D computer graphics based on natural feature tracking and image detection. Users can interact with the presented objects using virtual buttons displayed in an AR environment and by performing finger-pointing gestures. However, in the AR applications created with QCAR, the real environment is regarded primarily as a background for displaying computer-generated 3D content and the interaction capabilities between the real objects and the 3D content are limited.

Next post:

Previous post: