Database Reference
In-Depth Information
2.1
Pattern Specification
An occurrence of a pattern P is the presence of the pattern P in a given document.
There is an offset (multiple ones if the pattern occurs multiple times in the document)
at which the pattern occurs in the document. O s is the start offset, and O e is the end
offset of the pattern, where offset is the position of words relative to the beginning of
the document.
Simple patterns are the basic building blocks and can be either System-defined (i.e.,
pre-defined in the system), or User-defined.Begin para, Begin document are examples
of system-defined patterns. Examples of simple user-defined patterns are: keywords or
phrases.
Complex patterns are composed of simple patterns, complex patterns, and pattern
operators (listed below). Any arbitrary complex pattern can be composed using the
pattern operators. Current operators supported are summarized below:
OR: Disjunction of two simple or complex patterns P 1 and P 2 , denoted by ( P 1 OR
P 2 ), occurs when either P 1 or P 2 occurs. For example, “information” OR “filtering”
will be detected when either one of the keywords occurs.
NEAR: Proximity of two simple or complex patterns P 1 and P 2 , denoted by ( P 1
NEAR [/D] P 2 ), occurs when both P 1 and P 2 occur, irrespective of their order of
occurrence .“D” is the maximum distance allowed between the patterns P 1 and P 2 .
Default value of “D” is the scope of the operator (which can be the entire document).
FOLLOWED BY: Sequence of two simple or complex patterns P 1 and P 2 , denoted
by ( P 1 FOLLOWED BY [/D] P 2 ), occurs when the occurrence of P 1 is followed by
the occurrence of P 2 in a non-overlapping manner. The end offset of P 1 is less than the
start offset of P 2 ;“D” is the maximum distance allowed between the two patterns P 1
and P 2 .Ifthevalueof“D” is 1 (minimum value), this indicates that the patterns P 1 and
P 2 form a phrase.
WITHIN: Occurrence of a simple or complex pattern P in the range formed by the
start offset of the pattern P S and the end offset of P E , denoted by ( P WITHIN ( P S ,
P E )). The pattern is detected each time pattern P occurs in the range defined by patterns
P S and P E . For example, “information filtering” WITHIN (BeginPara, EndPara) will
be detected whenever the phrase “information filtering” occurs within a paragraph.
When an expression is specified without a system-defined pattern, the default structure
(e.g., a document) is used as the default. User defined P S and P E can be used.
NOT: Non-occurrence of a simple or complex pattern P in the range formed by
the start offset of P S and the end offset of P E . The general specification is (NOT
[/F]( P )( P S , P E )),where P , P S ,and P E can be arbitrary patterns. “F” indicates the
minimum number of occurrences and its default value is 1. For example, NOT (“filter-
ing”)(“information”, “retrieval”) will be detected whenever “information” is followed
by “retrieval” without the word “filtering” occurring at least once in between them.
FREQUENCY: Multiple occurrences of a simple or complex pattern that exceed or
equal to F , denoted by (FREQUENCY /[F] (P)). A pattern P is detected each time P
occurs at least F times, where “F” is the minimum number of occurrences specified by
the user. The default value of F is 1. All the occurrences that are used for detection
should be disjoint (i.e., the end offset of each pattern occurrence should precede the
Search WWH ::




Custom Search