Database Reference
In-Depth Information
elements, so every piece of text is indexed multiple times. Depending on how deeply
the text is nested in the document, this may be slow and create a huge number of
index files.
So, the best strategy for full-text indexes is to define them as narrowly as you can.
And be careful using wildcards, because they can quickly get out of hand!
Handling Mixed Content
You can decide how to handle mixed content by using the inline and ignore ele‐
ments. These elements can appear globally (as children of the lucene element) or per
index (as children of the text element). inline also has an effect on how Lucene
treats whitespace. They have the following format:
<inline qname = string />
<ignore qname = string />
qname holds the qualified name (with an optional namespace prefix) of the inline ele‐
ment.
Inline content and whitespace
By default, Lucene treats inline elements as token separators, which may or may not
be what you want. For instance, assume we have an XML fragment like:
<p> This is <b> un </b> clear. </p>
Because of the b inline element, Lucene will see this as "This is un clear." (notice
the space between un and clear )—probably not what you intended! To address this,
use an index definition like:
<lucene>
<text qname= "p" >
<inline qname= "b" />
</text>
</lucene>
Or, if the b element is always an inline element in all other elements of the collections
documents:
<lucene>
<text qname= "p" />
<!-- other text indexes -->
<inline qname= "b" />
</lucene>
Search WWH ::




Custom Search