Database Reference
In-Depth Information
indexes, because Data Explorer uses just one efficient structure rather than
two less efficient document-based structures. In a positional space index (see
Figure 7-3), a document is represented as a set of tokens, each of which has a
start and end position. A token can be a single word or a content range (for
example, a title, or an author's name). When a user submits a query, the
search terms match a passage of tokens, instead of a whole document. Data
Explorer doesn't compute a vector representation, but instead keeps all posi-
tioning information directly in its index. This representation enables a complete
rebuilding of the source documents, as well as the manipulation of any
subparts.
In Big Data deployments, index size can be a major concern because of the
volume of the data being indexed. Many search platforms, especially those
with vector space indexing schemes, produce indexes that can be 1.5 times
the original data size. Data Explorer's efficient positional index structure
produces a compact index, which is compressed, resulting in index sizes that
are among the smallest in the industry. In addition, unlike vector space
indexes, the positional space indexes don't grow when data changes; they
only increase in size when new data is added.
Another benefit of positional space indexes is field-level updating, in
which modifications to a single field or record in a document cause only the
modified text to be re-indexed. With vector space indexes, the entire document
needs to be re-indexed. This removes excessive indexing loads in systems with
frequent updates, and makes small, but often important, changes available to
users and applications in near-real time.
The concept of field-level security, which is related to field-level updates,
is particularly useful for intelligence applications, because it enables a single
classified document to contain different levels of classification. Data Explorer
3. Positional
Inverted Index
1. Content
2. Tokens
1.1 ›4, 1.4 ›2, 3.2 ›3, ...
3.7 ›1, 7.5 ›2
10.1 ›4, 11.5 ›7, ...
1.3 ›2, 2.3 ›1, 5.6 ›3, ...
2.1 4, 3.7 ›2, 3.9 ›10, ...
1.5 4, 2.6 ›7, 3.3 ›10, ...
Figure 7-3
A positional space index