Text Indexing and Lookup - eXist: A NoSQL Database and Application Development Platform

Database Reference

In-Depth Information

... text before the

match

and after the match...

Setting table to "yes" causes the output to be returned in an HTML table row

format:

<tr>

<td class= "previous" > ... text before the </td>

<td class= "hi" > match </td>

<td class= "following" > and after the match... </td>

</tr>

• If you specify link , the match will be enclosed in an a element with the value of

this attribute as its target. For example, specifying link="otherpage" will change

the output for the match to:

<a href= "otherpage" > match </a>

Defining and Configuring the Lucene Analyzer

Lucene allows its users to specify how text is analyzed. Analyzers are Java classes,

with each one defining a different way of tokenizing and/or filtering text. There are

several prebaked analyzers available. If you're indexing a language other than English,

it might be worthwhile to change the analyzer to one especially tailored for your lan‐

guage. Other reasons might include changing the list of stopwords (words ignored by

the analyzer).

A list of available analyzers can be found in the Lucene JavaDocs the list of direct

subclasses here tells you which analyzers are available.

By default, eXist uses the standard analyzer org.apache.lucene.analysis.stan

dard.StandardAnalyzer . Although called “standard,” it is actually an English ana‐

lyzer (and contains a list of the most-often-used English stopwords).

You can define and configure a different Lucene analyzer in the Lucene definition of

the collection.xconf document, as explained fully in “Defining and Configuring the

Lucene Analyzer” on page 298 . The analyzer element defines the Lucene analyzer

to use:

<analyzer class = string

id? = NCName >

param*

</analyzer>

Search WWH ::

Custom Search

Home