If Data Is the New Oil—You Need Data Exploration and Discovery - Harness the Power of Big Data

Database Reference

In-Depth Information

indexes, because Data Explorer uses just one efficient structure rather than

two less efficient document-based structures. In a positional space index (see

Figure 7-3), a document is represented as a set of tokens, each of which has a

start and end position. A token can be a single word or a content range (for

example, a title, or an author's name). When a user submits a query, the

search terms match a passage of tokens, instead of a whole document. Data

Explorer doesn't compute a vector representation, but instead keeps all posi-

tioning information directly in its index. This representation enables a complete

rebuilding of the source documents, as well as the manipulation of any

subparts.

In Big Data deployments, index size can be a major concern because of the

volume of the data being indexed. Many search platforms, especially those

with vector space indexing schemes, produce indexes that can be 1.5 times

the original data size. Data Explorer's efficient positional index structure

produces a compact index, which is compressed, resulting in index sizes that

are among the smallest in the industry. In addition, unlike vector space

indexes, the positional space indexes don't grow when data changes; they

only increase in size when new data is added.

Another benefit of positional space indexes is field-level updating, in

which modifications to a single field or record in a document cause only the

modified text to be re-indexed. With vector space indexes, the entire document

needs to be re-indexed. This removes excessive indexing loads in systems with

frequent updates, and makes small, but often important, changes available to

users and applications in near-real time.

The concept of field-level security, which is related to field-level updates,

is particularly useful for intelligence applications, because it enables a single

classified document to contain different levels of classification. Data Explorer

3. Positional

Inverted Index

1. Content

2. Tokens

1.1 ›4, 1.4 ›2, 3.2 ›3, ...

3.7 ›1, 7.5 ›2

10.1 ›4, 11.5 ›7, ...

1.3 ›2, 2.3 ›1, 5.6 ›3, ...

2.1 4, 3.7 ›2, 3.9 ›10, ...

1.5 4, 2.6 ›7, 3.3 ›10, ...

Figure 7-3 A positional space index

Search WWH ::

Custom Search

Home