Finding information with NoSQL search - Making Sense of NoSQL

Databases Reference

In-Depth Information

search rankings. This means that if you change the boost values, the documents must

be re-indexed. Although this example is somewhat simplified, it shows that accurate

markup of topic elements is critical to the search ranking process.

Once you've determined the elements and boost values, you'll create a configura-

tion file that identifies the fields you're interested in indexing. From there you can

run a process that takes each document and creates a reverse full-text index using the

element and boost values from your configuration file. Apache Lucene is an example

of a framework that creates and maintains these type of indexes. All the keywords

found in that element can then be associated with that element using a node identi-

fier for that element. By storing the element node as well as the document, you know

exactly in what element of the document the keyword was found.

After indexing, you're now ready to create search functions that can work with

both range and full-text indexes. The most common way to integrate text searches is

by using an XQuery full-text library that returns the ranked results of a keyword query.

The query is similar to a WHERE clause in SQL , but it also returns a score used to order

all search results. Your XQuery can return any type of node within DocBook, such as a

book, article, chapter, section, figure, or bibliographic entry.

The final step is to return a fragment of HTML for each hit in the search. At the

top of the page, you'll see the hits with the highest score. Most search tools return a

block of text that shows the keyword highlighted within the text. This is known as a

key-word-in-context ( KWIC ) function.

7.9

Case study: searching domain-specific languages—

findability and reuse

Although we frequently think of search quality as a characteristic associated with a

large number of text documents, there are also benefits to finding items such as soft-

ware subroutines or specific types of programs created with domain-specific languages

( DSL s) . This case study shows how a search tool saved an organization time and money

by allowing employees to find and reuse financial chart objects.

A large financial institution had thousands of charts used to create graphical finan-

cial dashboards. Most charts were generated by an XML specification file that

described the features of each chart such as the chart type (line chart, bar chart,

scatter-plot), title, axis, scaling, and labels. One of the challenges that the dashboard

authors faced was how to lower the cost of creating a new chart by using an existing

chart as a starting template.

All charts were stored on a standard filesystem. Each organization that requested

charts had a folder that contained their charts. Because of the structure, there was no

way to find charts sorted by their characteristics. Experienced chart authors knew

where to look in the filesystem for an example of a template, but new chart authors

often spent hours digging through old charts to find an old template that matched up

with the new requirement.

Search WWH ::

Custom Search

Home