Database Reference
In-Depth Information
start offset of the subsequent pattern occurrence). The same set of occurrences will not
be used for detecting multiple instances of the same pattern.
SYN: This is an option and is specified along with a single-word pattern (currently),
denoted by (P [SYN]), to indicate multiple single-word patterns that have the same
meaning, in a succinct manner. Specifying a single-word pattern with SYN option is
equivalent to specifying N simple patterns that carry the same meaning (synonyms)
as the original pattern. For example, if you specify the word “bomb”[SYN] is equiv-
alent to specifying “bomb” OR “explosive device” OR “weaponry” OR “arms” OR
“implements of war” OR “weapons system” OR “munition” . If any of these words
or phrases appears in the text, the pattern “bomb”[SYN] is detected. This option adds
simplicity and flexibility to the specification of single-word patterns. The same is true
for complex patterns with embedded synonym specification, e.g. “Bomb”[SYN] NEAR
“Ground Zero” .
Sample Query: Using the above operators, users can specify complex and meaning-
ful patterns. A complex pattern (“bomb” occurring prior to “ground zero” occurring
twice, with a single occurrence of “automotive” or its synonyms) , can be specified as:
Pattern P 1 =“
bomb
FOLLOWED BY
groundzero
Pattern P 2 =
FREQUENCY / 2
(
P 1 )
Pattern P 3 =
”[
]
P 2 NEAR
automotive
SYN
2.2
Pattern Detection
Pattern detection semantics are needed for detecting meaningful patterns, since in an
unrestricted semantics (where none of the pattern occurrences are discarded after par-
ticipating in pattern detection) not all the detected patterns are meaningful for an appli-
cation. Detection semantics essentially delimit the patterns detected and accommodate
a wide range of application requirements.
We want to emphasize that we have chosen to define proximal-unique semantics in
this paper based on the intuition of proximity and disjoint pattern detection. It is cer-
tainly possible to define other meaningful constraints leading to other useful semantics.
However, the framework remains the same and the algorithms change depending upon
the semantics used. It is indeed possible to include semantics of detection as an addi-
tional parameter when several of them are defined and supported.
Consider a document containing occurrences of words as shown in Figure 1. Suppose
we want to find occurrences of “cell” FOLLOWED BY “nucleic” within this document.
As shown in the figure, there are two occurrences of “cell” , one occurring at position
10, say cell 1 and the other at position 15, say cell 2 . The occurrences of nucleic are at
position 28 and 41, say nucleic 1 and nucleic 2 respectively. We could combine either
cell
cell
protein
nucleic
clustering
nucleic
10
34
41
15
20
28
Fig. 1. Pattern Occurrences (Example)
Search WWH ::




Custom Search