Information Technology Reference
In-Depth Information
2.2.1 Record the Keyword Information Appeared in the
Document
For a given document, the following steps are applied to it. Suppose we
have already chosen a set of words as keywords, say
K
{
K 1 , K 2 ,
···
, K m }
.
Record every keyword and its position in the document. We will use the
following notation:
=
k i represents one of the keyword in the keyword set
K
.
i represents the order of the keyword appearing in the document. (It is
possible that k i ,k j
are the same element in
K
.)
m represents the total number of keywords appearing in the document.
p i is a integer, which represents the total number of the words from the
beginning of the document to the word k i .
In addition we record the frequency of each keyword at the same time. Thus
we have the Keyword-Position information table(Table 1).
Table 1 Keyword-Position table
Keyword appears
in the document
Position
in the document
k 1
p 1
k 2
p 2
.
.
k m
p m
The details of this process are illustrated as follows (with Figure 1 as an
example). For this example, we use the keyword set: { bank, fund, account,
transfer } . Its Keyword-Position information is listed in Table 2. Frequency
information of each keyword for the given example (Figure 1) is listed in
Table 3.
Table 2 Keyword-Position information of the email inFigure 1
Keyword appears
in the document
Position
in the document
bank
91
fund
103
account
109
transfer
124
fund
153
transfer
155
account
158
 
Search WWH ::




Custom Search