Indexing Structure for Graph-Structured Data - Mining Complex Data

Information Technology Reference

In-Depth Information

retrieved from the index the actual existence of their instances in the knowledge

base is checked.

The second indexing structure [15] that has been introduced for RDF graphs

and can be used to process the queries concerning the complex relationships

among vertices in a graph is based on path expressions and su x arrays [14].

The base idea lies in extracting all possible path expressions from the indexed

graph and consequently create all su x arrays on string representations of the

path expressions. The main drawback of this approach lays in its limitation of

application to DAGs. Therefore, in this chapter we introduce our own indexing

structure for ecient query processing of path oriented queries.

Nonetheless, the search for complex relationships can be reduced to a reach-

ability query answering. Simply, instead of returning all paths lying between

two inspected vertices a single boolean value is returned answering a question

whether the start vertex can reach the end one. There are numerous algorithmic

approaches to solve this problem varying mostly by the structures they use to

compute the transitive closure of the relation. They are either a matrix based

like [2] that are based on Warshall's algorithm [22] or the graph based algo-

rithms [16, 17] or combining both approaches which results in a algorithm [3].

The indexing structures for ecient reachability query processing are labeling

schemes that stem from the XML and tree structure labeling schemes. The most

popular labeling schemes are based on either interval labeling scheme [1] or on

a structural approach like [10] or again combining both in a hierarchical label-

ing presented in [23]. Yet, these approaches can be used only to distinguish the

existence of a complex relationship between two vertices, further inspection of

the complex relationship itself is not possible.

10.3

Design of the Index

The graph theory proved that a very handy representation of a directed graph

is its adjacency matrix because using matrix algebra we can comfortably study

the graph's properties. For instance, if the adjacency matrix is powered by two,

each field in the resulting matrix contains a number of paths of length two lying

between each two vertices in the original graph. If the computation continued,

the result would contain amounts of all paths of an arbitrary length. Moreover,

with a slight modification of the matrix that is introduced later in this section

we would get not just the amounts of paths but the paths themselves.

Main diculty of matrix representation of a graph is that its use is limited

to fairly small graphs since the matrix grows in the quadratic space and the

multiplication operation on matrices has even cubic time complexity. Therefore,

we introduce graph transformation to enable the use of the matrix approach to

graphs of arbitrary size.

10.3.1 Graph Segmentation

The graph transformation designed to simplify the graph we used is called graph

segmentation . It takes the indexed graph and divides it into segments in a way

Mining Complex Data

Search WWH ::

Custom Search

Home