Introduction to Interactive 3D Multimedia Content

Abstract Progress in computing and network performance, accompanied by development of platform independent 3D content standards, opens the way for application of interactive 3D technologies in a variety of domains such as cultural heritage, education, training, tourism, and e-commerce. However, the potential of 3D/VR/AR technologies in everyday applications can be fully exploited only if complemented by the development of efficient and easy-to-use methods of creation, management, search, and presentation of interactive 3D multimedia content, which could be used by both expert and non-expert users. The whole topic is devoted to the above issues. In this introductory topic, we describe motivation for conducting research in the domain of interactive 3D multimedia content technologies, which focuses on two broad areas: 3D content creation and management, and 3D search and presentation.

Widespread use of interactive 3D multimedia technologies, including virtual reality (VR) and augmented reality (AR), has been enabled recently by remarkable progress in consumer-level hardware performance including cheap 3D accelerators available in most contemporary graphics cards, increasing quality and usability of various low-cost immersive and non-immersive displays and interaction devices as well as rapid growth in the available network bandwidth, which is now sufficient to deliver large amounts of data required by network-based interactive 3D multimedia applications.


Also users are well prepared for the shift from 2D to 3D interfaces. Popularity of 3D computer games, on-line communities and 3D movies based on computer graphics has increased their familiarity with 3D techniques and—at the same time—raised their expectations. Younger generations for whom such applications have become commonplace expect to have similar experiences in other areas. Thus, it is time to welcome interactive 3D multimedia technologies to other application domains such as cultural heritage, education, training, tourism, and e-commerce. Application of 3D technologies can breathe new life into these domains by providing new, enhanced user experiences.

Increasing interest can be observed in the exploitation of possibilities offered by interactive 3D worlds accessible on-line remotely over the Internet. Remote access to 3D/VR content, enabled by development of 3D content representation standards, allows users to experience distant virtual worlds in the same way as they can experience local 3D/VR applications. Moreover, on-line virtual worlds may be naturally combined with social media, which largely increases their attractiveness.

However, the potential of 3D/VR technology in everyday applications can be fully exploited only if accompanied by development of efficient and easy-to-use methods of creation, management, search and presentation of interactive 3D multimedia content, which could be used by both expert and non-expert users.

All the above facts make research on interactive 3D multimedia content technology particularly appealing and well motivated. This topic is a result of cooperation of researchers working in the Department of Information Technology at the Poznan University of Economics (Poland) [http://www.kti.ue.poznan.pl].Two main axes of research can be distinguished in these works: (1) 3D content creation and management, and (2) 3D search and presentation.

Different aspects of creation and management of high-quality meaningful interactive 3D content are covered by Chaps. 4, 5, 6, and 7. Practical 3D multimedia applications require enormous amounts of complex interactive content. Regardless of the employed rendering or interaction techniques, the content is what a user actually perceives. Moreover—in most cases—the content must be created by domain experts (e.g., museum curators, teachers in schools, sales experts, TV technicians) who cannot be expected to have experience in programming or 3D design. Only the involvement of domain experts guarantees sufficient amount of high-quality content, which may contribute to wider adoption of 3D multimedia applications in everyday use. Thus, it is necessary to develop methods and tools supporting 3D content creation and management.

3D content search and presentation are covered by Chaps. 8, 9, and 10. Development of interactive 3D multimedia is followed by rapidly growing number of multimedia objects available on the Internet. Simultaneously, user generated 3D content becomes more and more important. Currently, a trend following social networking is to generate 3D content collaboratively. To do that, efficient methods of content search are required. The intention is to have search engines devoted to 3D multimedia objects as efficient as they are for textual objects. 3D content search is, however, far more difficult because 3D objects are much more complex, have spatial and temporal properties, consist of complex structures containing geometry, behavior and other multimedia objects, and their semantics is hard to reveal in general.

A detailed description of the topics composing this topic is the following.

A number of standards have been developed to enable platform independent representation of 3D/VR content. These standards differ in their capabilities of describing content features and encoding methods, making them more suitable either for exchange of content between applications or for publishing of content over the network. The most versatile standards enabling publishing of interactive 3D multimedia content are VRML, X3D and MPEG-4, all approved by ISO/IEC. Other standards in this field include U3D, COLLADA, and 3D XML. Existing metadata standards enabling description of multimedia content are described as well.

The challenges result mainly from the continuous progress in development of 3D/VR technologies and systems. In particular, the shift from on-line publication of passive 3D content to building active 3D/VR network applications is discussed. Publication of passive 3D content was a significant achievement several years ago, but currently a shift is needed to enable creation of active 3D/VR network applications. This requires solving problems such as modeling 3D virtual worlds in databases, dynamic composition of virtual scenes, selection of content, parameterization, database access, and persistency. Further, the issue of efficient content creation by non-expert users is discussed—in both virtual and augmented reality environments. Lack of high-quality relevant content is one of the main obstacles in popularization of 3D/VR technologies in various application domains, such as cultural heritage and teaching. Only involvement of non-expert users in the content creation process can provide the large amount of content, which is required by real-life 3D/VR applications. Next, the switch from single-user to multi-user 3D environments is discussed, in particular in the context of the content creation. User-contributed content may—to a large extent—help solving the content creation problem, but to actively participate in content development, users must be assured that the content will not be misused, interacting with the content created by other users is safe, and that author rights to the content will be preserved. This requires a new paradigm for the access control. Then, the shift from file search to 3D object search is discussed. Popularization of 3D technologies results in creation of a large amount of 3D resources available online. This raises the issue of efficient 3D content search. To enable development of 3D search engines which are as efficient as existing text engines, two problems must be solved. First, new properties of 3D objects must be described using searchable metadata. Interactivity is an example of a property that is not currently appropriately handled. Second, efficient methods of searching objects based not only on their semantics but also on their spatial location and temporal extent are needed. Finally, the problem of building efficient 3D data visualization interfaces is discussed, in particular in the context of Web search results visualization. 3D scenes offer many advantages when used for data visualization, but they also impose important limitations and constraints. Therefore, research is needed on how to build useable and efficient 3D visualization interfaces which can be adapted for visualization of highly dynamic and complex datasets.

The existing 3D content standards, such as VRML/X3D and MPEG-4, provide methods of describing contents of fixed virtual scenes. In their current form, they enable building passive 3D/VR systems, i.e., systems where the 3D technology is employed to visualize some pre-designed virtual environments in the threedimensional way. Applicability of this approach is limited to presentation of fixed

(but possibly interactive) synthetic 3D models of some real or virtual environments. As such, these standards enable presentation of architectural design, artistic models, and simple animations but are not sufficient for implementation of more advanced 3D/VR network applications.

A real challenge in the area of interactive 3D/VR systems consists in building active applications. The intention is to build, instead of passive virtual scenes, active applications that enable server-side user interaction, dynamic composition of virtual scenes, access to on-line data, continuous visualization, implementation of persistency, etc.

In Chap. 4, a new approach, called X-VR, is presented that enables building active 3D/VR applications. In this approach, two new techniques are used: dynamic content modeling, which provides the prerequisite infrastructure to build active 3D/VR applications, and database modeling of virtual worlds, which enables building high-level database models of 3D/VR environments.

Dynamic content modeling is accomplished by the use of a new language, called X-VRML, which has been designed for this purpose. The X-VRML language provides means of parameterization of all elements of dynamically generated virtual scenes—the contents, the visualization methods, and the structure. The X-VRML offers programming concepts known from procedural languages like loops, conditions, and variables, which combined with the declarative VRML/X3D/MPEG-4 approach form a really powerful programming tool. Moreover, X-VRML is object-oriented, and provides convenient methods of retrieving data from databases. The retrieved data can affect all aspects of the dynamically created virtual scenes. The X-VRML language is based on XML, which makes X-VRML consistent with new trends in development of 3D content standards.

For database modeling of virtual worlds, a method called X-VRDB is proposed. The method consists in dividing a virtual world model into four distinct elements—virtual world data, virtual world structure, virtual scene templates, and virtual scenes. Modeling virtual world data in a database enables efficient management of large amounts of data, provides powerful selection capabilities, enables continuous updating of the virtual world model and implementing persistency in virtual worlds. Database modeling of the virtual world structure enforces consistency between the elements of the virtual world model, enables creation of a library of shared reusable system components, and improves performance in case of large virtual world models. Database modeling of virtual scene templates and virtual scenes permits one to use multiple virtual scene templates in one virtual world model, enables formal definition of virtual scene template parameters, and enables storing these parameters to enable later reconstruction of previously created virtual scenes.

In Chap. 5, the concept of configurable 3D applications is presented. Despite remarkable progress in software, hardware and network performance as well as evident economic and societal prospects, the actual uptake of interactive 3D applications in everyday use is still very low. Apparently, the availability of 3D content delivery and presentation technologies alone is not yet sufficient for successful deployment of real-life 3D/VR applications. One of the main problems that currently limits wide use of 3D applications on everyday basis is the difficulty of creating high-quality meaningful interactive 3D content.

Simplification of the content creation can be achieved by enabling users to set up the content from predefined building blocks—components. Given a library of different components such as 3D geometry, sounds, scenarios, sensors, schedulers, and interaction elements, users could efficiently build 3D content by configuring the content from these components. Clearly, there is a trade-off between the flexibility of content creation tools and their ease of use. Generally, the more an authoring environment allows a user to do, the more difficult it is to operate. Creation of content based on configuration does not permit one to achieve all possible results, but the process is significantly easier than creating the content from scratch. Furthermore, limitations of the configuration can be mitigated by assigning different roles to nonexpert and expert content designers. A non-expert content creator may build virtual scenes by assembling components taken from a ready-to-use library. It is relatively easy to compose a scene, but the process is somehow constrained. However, additional functionality may be achieved by adding new types of components to the library at any time. This task can be performed by programmers or 3D designers. This approach well fits the organization of work on 3D content creation in many practical applications, in domains such as cultural heritage and education.

In Chap. 5, a new approach to building interactive 3D/VR applications, called Flex-VR, is described. Flex-VR, enables building configurable 3D applications in which content can be relatively easily created and modified by common users. Flex-VR applications are based on configurable content, i.e., content that may be interactively or automatically configured from separate components. Interactive configuration of content enables efficient production of content by both expert users and non-expert users, without going into detail of 3D design or programming processes. Automatic content configuration enables adaptation of content to various requirements such as the target environment or the target group of users. The Flex-VR approach can be applied to building local as well as network-based 3D Web applications. The Flex-VR approach consists of five interrelated elements: content parameterization, content structuralization (Beh-VR), content model, design patterns, content pipeline.

The Flex-VR content parameterization allows for quick and efficient creation and modification of 3D/VR content by a designer, an operator or an end-user. Moreover, thanks to the real-time content update techniques it is safe to manipulate the content which has already been created and it is being used (e.g., in live TV production).

To achieve flexible content composition and real-time update of complex interactive behavior-rich content, structuralization of the content is required. For this purpose, a novel 3D/VR content structuralization model, called Beh-VR, is proposed. Beh-VR enables creation of high-level behavioral objects that can be then assembled into virtual scenes.

A generic Flex-VR content model describes 3D/VR applications on a higher level of abstraction than a typical content representation standard. Particular virtual scenes, or sequences of virtual scenes, are specific projections of the generic model depending on various factors such as user interactions, preferences, and privileges. Most importantly, the generic high-level model can be used to easily manipulate the 3D/VR content, allowing this task to be performed by application domain experts with simple GUI tools. Also, the use of the Flex-VR content model significantly increases the possibilities of content reuse.

A collection of Flex-VR design patterns provides general reusable solutions to common problems in 3D/VR application design, making the process of creating 3D/VR applications much more efficient. The use of design patterns also imposes known semantic structure on the content, and therefore simplifies creation and modification of complex applications.

Finally, the Flex-VR content pipeline facilitates separation of the different tasks involved in the process of content creation into specific phases, which can be performed in different time-scales, by people with different skills and using different tools.

Several Flex-VR design patterns enabling configuration of complex content structures are also described. These elements form the core of the Flex-VR approach and are sufficient for building configurable 3D Web applications. An example of a practical Flex-VR application in the cultural heritage domain is also presented.

Augmented reality combines virtual reality with video processing and computer vision technologies. The AR technology enables merging three-dimensional virtual objects with real objects, resulting in an augmented reality environment (ARE). In such an environment, users are able to interact with the virtual objects in a direct and natural way by manipulating real objects without the need of using sophisticated input devices. Thus, augmented reality environments give users a unique opportunity to perform hands-on experiments on virtual objects by manipulating physical objects in their real environment.

However, current prevalence of the AR technology is still very low due to the fact that users cannot easily create and modify AR content, and even when such modifications are possible, their range is usually very restricted. In existing solutions, creating new content requires involvement of highly skilled IT professionals, who are experts in design and implementation of interactive 3D content. As a result, end-users are condemned to use ready-made content and they cannot easily and quickly create, update, and modify this content.

In this approach, augmented reality environments are created based on Augmented Reality Scenario Model (ARSM). In the ARSM model, the concepts of AR-Class and AR-Object are proposed, according to the object-oriented paradigm. The concept of the class, composed of properties and methods, is extended with elements required for building interactive presentations in AR environments, such as: three-dimensional geometry, interactive behavior, media objects, and aggregation relationships between AR-Classes. The geometry and behavior specified in AR-Classes can be customized in AR-Objects by setting different property values. The values of the properties can be changed at runtime, hence the visual and behavioral features of the AR-Objects can dynamically change in time.

The dynamism of the content visualization is fully controlled by the behavior of the objects comprising the content. Behavior of AR-Objects is described by methods and activities. An activity is a new concept in comparison to the traditional object-oriented approach. Each activity describes some distinctive interactive behavior of an AR-Object. Each AR-Object can contain a number of different activities, which can be activated at runtime. Activities describe behavior of AR-Objects in time, in particular they describe reactions to some events occurring in an AR environment. Each activity is composed of named states and transitions between the states. Each state describes some actions that are performed within this state.

In the proposed Augmented Reality Scenario Model (ARSM), an augmented reality scene is represented as a graph composed of nodes representing AR-Objects. AR-Objects permit one to describe in a uniform way three categories of entities that can be found in an AR environment, namely, real objects, virtual objects, and scenes comprised of real and virtual objects. The solution enables describing the whole spectrum of augmented reality scenes ranging from real scenes composed of real objects, through mixed scenes composed of both real and virtual objects, to virtual scenes composed merely of virtual objects.

The AREM approach has been implemented in the ARE Presentation System (AREPS), which offers visualization of interactive AREs composed of AR-Objects. The AREM approach has been experimentally verified by practical application of the AREPS system in a number scenarios across the cultural heritage and education domains.

Factors that stimulate development of such environments are: fast progress in software, including rendering engines, physics engines and 3D software development kits, and even faster development of hardware, such as increasing processing power of the graphics processing units and speed of the broadband and wireless connections. A limiting factor is the high cost of creation, maintenance and development of interactive 3D content for multiuser virtual environments. A commonly recognized solution to this problem is user-generated content. Such content is not only cheaper to develop but also more authentic and closer to users. Therefore, modern virtual environments are not only multiuser and multi-access but also interactive, behavior-rich, highly dynamic and based on user-contributed content.

Traditional, coarse-grained and geometry-centric access control and privilege modeling methods are not sufficient for such environments. Protection should concern geometrical models, their relationships and structure, as well as inter-object behavioral interactions. New methods of access control are required, yet these cannot impose too many restrictions in the phase of creation of users’ virtual objects because the goal of those environments is to promote user creativity and sociability.

To protect behavioral data, effective but unobtrusive and flexible access control model using privileges based on interactions between objects in a persistently running virtual environment is needed. Possible interactions can be thoroughly analyzed by taking into account the call range of object methods. Access control model should be expressive enough to provide for inter-object dependencies and their semantics, as objects are created not only from scratch but also as compositions of preexisting objects coming from different sources. The privilege system should automatically encompass newly created objects and follow the evolution of virtual environment data. Privileges should be manageable and understandable by human operator and—at the same time—should be applicable at a fine-grained level.

The SSM method consists of two elements. The first one is a flexible Virtual Reality Privilege Representation (VR-PR) for virtual environment objects. The second one is a semantic extension of VR-PR— Knowledgebase of Objects Behavior (KBOB), built according to the Ontology of Objects Behavior (OOB).

The SSM method enables modeling privileges for virtual environment behavioral resources with respect to their semantics. The SSM method is based on the concept of semantic operations. Semantic operations are generated at run-time from the virtual environment data model and are applicable to the access control model as a part of a privilege. Semantic consistency of the privilege set is forced by a two-phase regeneration and validation mechanism, so that user privileges can still be expressed in a precise, semantically accurate and flexible way.

This problem originates from the growing demand for interactive 3D applications. Time required to prepare such applications depends heavily on availability of reusable 3D objects with embedded behavior. Effective search for interactive 3D objects needed for new applications requires object metadata that cover not only object geometry and semantics but also its behavior.

Existing metadata standards define different schemes for describing objects. General metadata standards such as Dublin Core or XMP enable storing basic information about an object—e.g., title, creator, or its creation date. Specific standards such as MPEG-7 or MXF DMS-1 enable storing technical and semantic information about an object. The missing metadata fragment, not included in the existing metadata standards, is metadata of object behavior. Such information should be stored in a format that permits efficient search for objects with specific interaction properties.

The proposed approach is based on two main elements: a model of interactive object interactions and a concept of interaction interface. Based on these two elements, a special query sub-language has been implemented. The sub-language enables efficient usage of interaction metadata.

A model of 3D object interactions, called Multimedia Interaction Model—MIM, enables decomposition of object interaction capabilities into components. These components are described by distinct metadata structures. The structure of interaction metadata makes it possible to use both semantic textual descriptions and formal mathematical descriptions. Semantic descriptions can be used at different levels of the metadata structure, providing both general and specific descriptions. General descriptions concern the interaction as a whole, while specific descriptions concern particular details of the interaction. Moreover, semantic descriptions can be linked with an ontology specific to a given application domain. This possibility increases the informational value of interaction metadata and quality of search results. Mathematical descriptions are used for metadata that enable calculating a new state of an object after the interaction. The new state can be used as a starting point for next interactions. Therefore, mathematical descriptions enable search engines to run searches for interactive objects based not only on current object state but also on object states resulting from an interaction.

The MIM model is the basis for a new approach to metadata design. In the proposed approach, interactive objects are treated as elements of a computer program, instead of static items. The foundation of such approach is the concept of Interaction Interface—II. The structure of the interaction interface is fixed, and there are specific rules for defining new interaction interfaces. Interaction interfaces, as a part of interaction metadata, provide detailed semantics of parameters related to interaction. Extended semantic information contained in the interaction metadata facilitates searches for objects with specific interaction properties.

An example area, where such information often appears, is cultural heritage. In the cultural heritage domain, concepts related to cultural objects are dependent on geographical location and time, can evolve over time and have different meanings in different locations and at different periods. As a result, search methods developed so far, both keyword-based and semantically oriented, fail when applied to cultural heritage because of the imprecision of the data contained in museum knowledgebases, as well as those specified in user queries.

So far none of the keyword search and semantic search methods developed within a single research have been able to retrieve the results with sufficient precision and recall. Keyword search will return inaccurate results because terms—especially used for expressing time—are ambiguous in the cultural heritage context. Also, a purely semantic search method would return inaccurate results, because some concepts are not explicitly connected to each other, but they relate only through their temporal and geographical coincidence.

The TSTSM method is based on a new similarity measure, called TST, that allows assessing the distance between different concepts in a semantic, spatiotemporal dataset. During the evaluation of this measure, for each concept in the knowledgebase a fuzzy set of points in the time-space continuum is constructed. In such a set, the more closely a point is related to the analyzed node in the underlying semantic graph, the higher degree of membership it receives. In the TSTSM method, a user query consists of keywords connected with Boolean operands such as AND, OR and NOT, similarly to the traditional information retrieval methods. The keywords included in the query are disambiguated through the interactive process and mapped onto the concepts stored in the knowledgebase. In the next step, fuzzy sets corresponding to the concepts from the user query are joined accordingly to the operands in the query—intersection for AND, union for OR, and complement for NOT. A joining operation is performed in 3-dimensional space, representing points in space and time. As a result, another fuzzy set is obtained that represents the region in space and time that the user is interested in. Finally, for each node being a possible answer to the query, the TST similarity between the considered node and the user query is computed. On the basis of the TST measure the results are ranked. The resource for which the TST similarity measure is the highest is the best match.

This problem arises when—in response to a query—thousands of documents fulfill relevancy criteria. Currently, almost all widely accepted indexing search engines use textual interfaces based on the same presentational metaphor: a list of links to documents conforming to a user query, each link associated with a short document description. The list of links is presented in the form of one or more HTML pages. If the number of results exceeds one page, more pages can be generated, usually until some limit specific to the search engine. All links are ordered according to a search engine-specific relevance algorithm, which ranks documents reflecting the accuracy of keyword matching, popularity of the Web page, number of links, etc. The above method of search result presentation is suitable for Web users, whose search goal is well defined and who may specify a query using several keywords. However, this method of search results presentation is not sufficient if a user is interested in a general overview of a particular topic or if his/her research interests are broad. In such a case, what a user is interested in is a holistic presentation of the whole data set satisfying a query instead of the details of the first few documents. To achieve this goal using a classical textual interface, a user has to collect information scattered through many pages.

To really meet the user needs in such a case, an interface is needed that would be able to show all the results holistically. Such an interface should be able to present a multidimensional set of search results, in a synthetic, yet still comprehensible way. An interactive 3D graphical representation of data may be efficiently used for this purpose.

By presenting the search results in 3D, one gains three clear advantages which are of critical importance for search engine visualization systems: large information capacity, enhanced user cognition and interactivity. Information capacity is of primary importance in the case of large sets of multidimensional data. Three dimensional objects can provide information in form of shapes, colors, textures, positions, sizes, orientations, and even behavior. 3D space is not limited in size, so the only limiting factor of visualization is user perception.

The second advantage is enhanced user cognition. A spatial metaphor for representing data is closer to the manner in which humans perceive the surrounding world. A 3D environment permits a user to change the viewpoint to improve perception and understanding of observed data. A user trying to discover a purport of objects in 3D space may rotate or translate the objects if their purport is not clear at first sight. Information presented in this way is learned faster and more efficiently.

The third advantage is interactivity, which does not only mean navigation in the space but also interaction with the content, e.g., moving and rotating objects and selecting objects that are of interest to the user. To support user interaction with 3D content, a 3D scene may be enriched with 2D interface elements well known to users accustomed to windows-based interfaces.

However, employing 3D worlds to search result visualization, besides its all advantages, brings also some difficulties. The most important difficulties are: occlusions, complex navigation and lack of easy-to-use 3D pointing devices, limitations of user perception as well as difficulties with presentation of textual information and submitting user inputs and queries.

The AVE approach permits visualization of the entire search result by the use of the most appropriate interface selected from a library of available interfaces, based on quantitative properties of the search result. Within the interface, a user may freely assign search result properties to visual elements of the interface, which permits to emphasize particular features of the search result and therefore perceive it in the best way. By applying the most appropriate 3D interface and using different visualization metaphors at each step of the search process, a user can browse from the broadest categorized set of documents, up to a small set of documents of interest.

Next post:

Previous post: