Text Indexing and Lookup - eXist: A NoSQL Database and Application Development Platform

Database Reference

In-Depth Information

elements, so every piece of text is indexed multiple times. Depending on how deeply

the text is nested in the document, this may be slow and create a huge number of

index files.

So, the best strategy for full-text indexes is to define them as narrowly as you can.

And be careful using wildcards, because they can quickly get out of hand!

Handling Mixed Content

You can decide how to handle mixed content by using the inline and ignore ele‐

ments. These elements can appear globally (as children of the lucene element) or per

index (as children of the text element). inline also has an effect on how Lucene

treats whitespace. They have the following format:

qname holds the qualified name (with an optional namespace prefix) of the inline ele‐

ment.

Inline content and whitespace

By default, Lucene treats inline elements as token separators, which may or may not

be what you want. For instance, assume we have an XML fragment like:

<p> This is <b> un </b> clear. </p>

Because of the b inline element, Lucene will see this as "This is un clear." (notice

the space between un and clear )—probably not what you intended! To address this,

use an index definition like:

</text>

</lucene>

Or, if the b element is always an inline element in all other elements of the collections

documents:

</lucene>

Search WWH ::

Custom Search

Home