Databases Reference
In-Depth Information
search rankings. This means that if you change the boost values, the documents must
be re-indexed. Although this example is somewhat simplified, it shows that accurate
markup of topic elements is critical to the search ranking process.
Once you've determined the elements and boost values, you'll create a configura-
tion file that identifies the fields you're interested in indexing. From there you can
run a process that takes each document and creates a reverse full-text index using the
element and boost values from your configuration file. Apache Lucene is an example
of a framework that creates and maintains these type of indexes. All the keywords
found in that element can then be associated with that element using a node identi-
fier for that element. By storing the element node as well as the document, you know
exactly in what element of the document the keyword was found.
After indexing, you're now ready to create search functions that can work with
both range and full-text indexes. The most common way to integrate text searches is
by using an XQuery full-text library that returns the ranked results of a keyword query.
The query is similar to a WHERE clause in SQL , but it also returns a score used to order
all search results. Your XQuery can return any type of node within DocBook, such as a
book, article, chapter, section, figure, or bibliographic entry.
The final step is to return a fragment of HTML for each hit in the search. At the
top of the page, you'll see the hits with the highest score. Most search tools return a
block of text that shows the keyword highlighted within the text. This is known as a
key-word-in-context ( KWIC ) function.
7.9
Case study: searching domain-specific languages—
findability and reuse
Although we frequently think of search quality as a characteristic associated with a
large number of text documents, there are also benefits to finding items such as soft-
ware subroutines or specific types of programs created with domain-specific languages
( DSL s) . This case study shows how a search tool saved an organization time and money
by allowing employees to find and reuse financial chart objects.
A large financial institution had thousands of charts used to create graphical finan-
cial dashboards. Most charts were generated by an XML specification file that
described the features of each chart such as the chart type (line chart, bar chart,
scatter-plot), title, axis, scaling, and labels. One of the challenges that the dashboard
authors faced was how to lower the cost of creating a new chart by using an existing
chart as a starting template.
All charts were stored on a standard filesystem. Each organization that requested
charts had a folder that contained their charts. Because of the structure, there was no
way to find charts sorted by their characteristics. Experienced chart authors knew
where to look in the filesystem for an example of a template, but new chart authors
often spent hours digging through old charts to find an old template that matched up
with the new requirement.
Search WWH ::




Custom Search