From File Search to 3D Object Search (Issues in Creation, Management, Search and Presentation of Interactive 3D Content)

Undeniable success of the Word-Wide Web is to a large extent a result of the availability of efficient search systems which enable users to find relevant information and resources within seconds. Search systems are also critical to facilitate reuse of content and software on a large scale. However, due to different nature of the 3D content, 3D virtual environments impose new requirements on the search criteria. The two main differences are related to (1) what is being searched for and (2) where it is being searched.

Finding 3D objects requires specialized search engines. Some tools already exist, for example, a prototype search engine that uses geometric characteristics of 3D objects has been made available on the Internet [32]. However, this method of searching is limited to static geometry only. A practical search engine needs to address all components of an interactive 3D object—its geometry [13], semantics [35], and behavior [88]. While to some extent geometry information can be extracted from the object itself, its semantics and especially behavior are difficult or impossible to gather automatically. Therefore, there is a need for a metadata solution and a corresponding querying language that would enable describing additional properties of 3D objects and building search engines capable of using such information.

Many metadata standards tailored for different domains and applications have been developed [75]. The most basic use of metadata is for general cataloguing and indexing. However, each domain has its own set of metadata solutions. A wide spectrum of metadata solutions has been designed for multimedia objects. The most important are Dublin Core [41, 91], XMP [90], EXIF [30], DIG35 [27], NISO Z39.87 [92], MusicBrainz [55, 78], ID3 [39], AAF [34], MXFDMS-1 [76], P_META [65], MPEG-7 [53]. From these standards, however, all but one deal with general object semantics or domain specific information.


More advanced is the MPEG-7 standard (cf. Sect. 2.2.3). It is intended for creating metadata of multimedia content. It standardizes tools such as Descriptors, Description Schemes and the Description Definition Language (DDL). Descriptor is a basic element of the MPEG-7 metadata. It is used to represent specific features of the content. Generally, descriptors represent low-level features such as visual properties (e.g., texture, camera motion) or audio properties (e.g., melody). Descriptor specification defines the syntax and semantics of the feature representation. Description Scheme is a structural component of MPEG-7. It defines the structure and semantics of the relationships between its components, which may be both Descriptors and Description Schemes. With Description Schemes it is possible to describe multimedia content structure and semantics. Both description tools presented above are represented using Description Definition Language (DDL). DDL enables the creation of new Description Schemes and Descriptors. It permits one to extend and modify existing description tools.

The MPEG-7 standard has been designed for all kinds of multimedia content and, with its vast base of Descriptors and Description Schemes, it can be used as a solution for characterization of audio, image or video objects. However, it is not sufficient for 3D multimedia objects.

To cope with this problem, an MPEG-7 extension called 3D SEmantics Annotation Model (3DSEAM) has been proposed [14]. The main assumption in this project is that MPEG-7 description tools are sufficient and it is only the Media Locator and Region Locator descriptors that need to be extended. Consequently, the 3DSEAM defines the Structural Locator and the 3D Region Locator for this purpose.

Another approach to 3D metadata design is used by AIM@SHAPE project [2] which aims at fostering the development of new methodologies for modeling and processing of knowledge related to digital shapes. This includes geometry (the spatial extent of the object), structure (object features and part-whole decomposition), attributes (colors, textures), semantics (meaning, purpose), and time-dependent features (morphing, animation). The main focus of AIM@SHAPE is ontology for describing virtual human body and linking semantics to shape or shape parts. Metadata solutions proposed by AIM@SHAPE have been implemented as extensions to the COLLADA file format (cf. Chap. 2).

Similar idea has been researched by the SALERO (Semantic AudiovisuaL Entertainment Reusable Objects) project [36], which investigates the production of digital content for cross-platform reusable media and provision of ‘intelligent content’ for games, web-animations, movies and broadcast. However, the final results of the project [79] are focused more on automation of digital content production than on universal metadata of 3D objects.

Due to the focus of other approaches and projects on specific application areas of 3D objects, the best available solution for generic metadata of 3D multimedia objects appears to be MPEG-7 complemented with 3DSEAM extensions. Nevertheless, the MPEG-7 and its extensions have been designed to address the general, semantic and structural properties of a non-interactive object, and are not capable of describing interactive 3D objects.

To enable implementation of search engines capable of dealing with interactive 3D objects, in addition to the metadata description schemes, a query language is needed. Currently, there are no metadata query languages designed with interaction metadata in mind, but there are more general query languages. Typically, object interaction properties are a subset of all object properties and interaction metadata are a part of a larger description. Therefore, an interaction metadata query language is needed that would merge with existing metadata query solutions.

The problem of describing interactive properties of 3D multimedia objects is further discussed and a proposed solution is described in Chap. 8.

The second important issue, related to the implementation of search mechanisms in the context of 3D applications, is the search based not only on semantics and properties of the objects but also on their location in space and time. This issue becomes more challenging if we consider applications in which objects or regions related to search terms change their positions or appear in different timeslots in different places. Additional complexity arises from the semantic ambiguity and space-and time-dependency of the meaning of the search terms.

Semantic relationships, processing of spatial and processing of temporal data are, generally, distinct fields of research, which individually have been widely studied. Research in the area of semantics and ontology of resources on the Web has led to formulation of the notion of the Semantic Web [12]. The Semantic Web can be described as an extension of the WWW in which information resources have a well-defined meaning, and therefore can be unambiguously identified and processed by software. Semantically described resources enable building different types of semantic search engines and analytic tools. In this domain, research is performed on topics such as semantic search, semantic query languages, correlation and similarity finding, classification, and clustering.

Research on analysis and processing of spatial data has been conducted for more than fifty years in the area of the Geographic Information Systems (GIS) [23]. Processing of spatial queries is a well-studied subject. It includes spatial selection queries, which return sets of spatial objects that satisfy spatial predicates, and spatial join queries which join sets of spatial objects based on a spatial predicate [6].

Methods of storing and querying time-dependent data have been extensively studied in the field of temporal databases. Temporal databases were initially studied as an extension to relational databases, however, as new data models started to emerge, their temporal extensions such as temporal semantic networks or temporal query languages have been also proposed.

However, practical search systems require a solution that would take into consideration all the three aspects together, i.e., semantics, space, and time. An example of an application domain in which this type of queries is prevalent is cultural heritage. Concepts related to cultural objects are dependent on geographical location and time, can evolve over time and may have different meanings in different locations and at different periods. Consequently, the search methods developed so far, both keyword-based and semantically oriented, fail because of imprecision of data—on the one hand the data contained in searched databases, and on the other hand, data specified in user queries.

Next post:

Previous post: