Databases Reference
In-Depth Information
Table 1: A comparison of methods of storing XML documents in relational databases
path-based indexing.
These two approaches are not mutually exclusive. In fact, there are hybrid indices
which own the both characteristics.
In position-based indices, the position of the occurrences of text objects (such
as tags, attributes, characters, and etc.) are represented by the offset from the first
character of documents. Occurrences of elements are represented by a pair of
their start and end positions 4, 3) , which is called a region . Another method of
representing regions, called Relative Region Coordinate, is proposed to efficiently
handle updates 10) . In processing queries on XML documents, it is often necessary
to retrieve occurrence positions of a given element containing a given word. Such
operations are easily processed by using invert indices for elements and inverted
indices for words.
In path-based indices, occurrence positions of elements and attributes in a
document are represented by path expressions. Usually, paths from the root to
element (or attribute) nodes in concern are used. In many queries, however,
conditions on paths are specified ambiguously. We have proposed a new index
which is suitable for efficient processing of ambiguous path expressions 29) . In our
index, for each element node n which has mixed content, a string concatenating
1.
start tags from the root to n,
2.
the content of n, and
3.
end tags from n to the root
is created. We call such concatenated string an ENRP (Element with Normal and
Reverse Path). For each attribute node, a character string called ANRP (Attribute
with Normal and Reverse Path) is created analogously. The new index is basically a
suffix array 8) of ENPRs and ANPRs. Let us consider the document tree in Figure 2.
Search WWH ::




Custom Search