Text Indexing and Lookup - eXist: A NoSQL Database and Application Development Platform

Database Reference

In-Depth Information

type= "java.io.Set" >

</param>

</analyzer>

</lucene>

Now the h1 element is indexed with the stopwords the , a , and an only.

Manual Full-Text Indexing

There is yet another way to use the Lucene full-text indexer inside eXist. You can

manually (through your own XQuery code) create an index associated with a

resource in the database. You can then use this index to query the contents of this

resource. Interestingly enough, the resource does not have to be an XML document,

so, in conjunction with the contentextraction extension module (see contentex

traction ), you can create indexes to search binary content!

Here is how it works:

1. For some resource in your database (XML or otherwise), extract (or create) the

text fragments you want to index. For instance, assume we have an XHTML

document for which we want to index all the p and h3 elements. We also want to

be able to search the p and h3 elements separately.

2. Create an XML fragment with root element doc in which you list all these text

fragments and add them to so-called fields . A field can be seen as a subindex on a

document, so in our case we create two fields: one for the h3 elements, called

headers , and one for the p elements, called paras . Here is the code that does this:

declare namespace xhtml = "http://www.w3.org/1999/xhtml" ;

let $ resource := '/db/path/to/your/xhtml/document'

let $ index-def :=

<doc>

{

for $ header in doc ( $ resource )// xhtml:h3

return

<field name = " headers " store = " yes " > { string ( $ header ) } </field>

}

{

for $ para in doc ( $ resource )// xhtml:p

return

<field name = " paras " store = " yes " > { string ( $ para ) } </field>

}

</doc>

Search WWH ::

Custom Search

Home