XML Databases - Nontraditional Database Systems

Databases Reference

In-Depth Information

Table 1: A comparison of methods of storing XML documents in relational databases

•

path-based indexing.

These two approaches are not mutually exclusive. In fact, there are hybrid indices

which own the both characteristics.

In position-based indices, the position of the occurrences of text objects (such

as tags, attributes, characters, and etc.) are represented by the offset from the first

character of documents. Occurrences of elements are represented by a pair of

their start and end positions 4, 3) , which is called a region . Another method of

representing regions, called Relative Region Coordinate, is proposed to efficiently

handle updates 10) . In processing queries on XML documents, it is often necessary

to retrieve occurrence positions of a given element containing a given word. Such

operations are easily processed by using invert indices for elements and inverted

indices for words.

In path-based indices, occurrence positions of elements and attributes in a

document are represented by path expressions. Usually, paths from the root to

element (or attribute) nodes in concern are used. In many queries, however,

conditions on paths are specified ambiguously. We have proposed a new index

which is suitable for efficient processing of ambiguous path expressions 29) . In our

index, for each element node n which has mixed content, a string concatenating

1.

start tags from the root to n,

2.

the content of n, and

3.

end tags from n to the root

is created. We call such concatenated string an ENRP (Element with Normal and

Reverse Path). For each attribute node, a character string called ANRP (Attribute

with Normal and Reverse Path) is created analogously. The new index is basically a

suffix array 8) of ENPRs and ANPRs. Let us consider the document tree in Figure 2.

Search WWH ::

Custom Search

Home