Databases Reference
In-Depth Information
Full-text search —Full-text search is the process of finding documents that con-
tain natural language text such as English. Full-text search is appropriate when
your data has free-form text like you'd see in an article or a book. Full-text
search techniques include processes for removing unimportant short stop words
(and, or, the) and removing suffixes from words (stemming).
Semi-structured search —Semi-structured searches are searches of data that has
both the rigid structure of an RDBMS and full-text sentences like you'd see in a
Microsoft Word document. For example, an invoice for hours worked on a con-
sulting project might have long sentences describing the tasks that were per-
formed on a project. A sales order might contain a full-text description of
products in the order. A business requirements document might have struc-
tured fields for who requested a feature, what release it will be in, and a full-text
description of what the feature will do.
Geographic search —Geographic search is the process of changing search result
ranking based on geographic distance calculations. For example, you might
want to search for all sushi restaurants within a five-minute drive of your current
location. Search frameworks such as Apache Lucene now include tools for inte-
grating location information in search ranking.
Network search —Network search is the process of changing search result rank-
ings based on information you find in graphs such as social networks. You
might want your search to only include restaurants that your friends gave a four-
or five-star rating. Integrating network search results can require use of social
network API s to include factors such as “average rating by my Facebook
friends.”
Faceted search —Faceted search is the process of including other document prop-
erties within your search criteria, such as “all documents written by a specific
author before a specific date.” You can think of facets as subject categories to
narrow your search space, but facets can also be used to change search ranking.
Setting up faceted search on an ordinary collection of Microsoft Word docu-
ments can be done by manually adding multiple subject keywords to each docu-
ment. But the costs of adding keywords can be greater than the benefits gained.
Faceted search is used when there's high-quality metadata (information about
the document) associated with each document. For example, most libraries
purchase book metadata from centralized databases to allow you to narrow
searches based on subject, author, publication date, and other standardized
fields. These fields are sometimes referred to as the Dublin Core properties of a
document.
Vector search —Vector search is the process of ranking document results based on
how close they are to search keywords using multidimensional vector distance
models. Each keyword can be thought of as its own dimension in space and the
distance between a query and each document can be calculated as a geographi-
cal distance calculation. This is illustrated in figure 7.1.
Search WWH ::




Custom Search