Information Technology Reference
In-Depth Information
2.2.1 Record the Keyword Information Appeared in the
Document
For a given document, the following steps are applied to it. Suppose we
have already chosen a set of words as keywords, say
K
{
K
1
, K
2
,
···
, K
m
}
.
Record every keyword and its position in the document. We will use the
following notation:
•
=
k
i
represents one of the keyword in the keyword set
K
.
•
i
represents the order of the keyword appearing in the document. (It is
possible that
k
i
,k
j
are the same element in
K
.)
•
m
represents the total number of keywords appearing in the document.
•
p
i
is a integer, which represents the total number of the words from the
beginning of the document to the word
k
i
.
In addition we record the frequency of each keyword at the same time. Thus
we have the Keyword-Position information table(Table 1).
Table 1
Keyword-Position table
Keyword appears
in the document
Position
in the document
k
1
p
1
k
2
p
2
.
.
k
m
p
m
The details of this process are illustrated as follows (with Figure 1 as an
example). For this example, we use the keyword set:
{
bank, fund, account,
transfer
}
. Its Keyword-Position information is listed in Table 2. Frequency
information of each keyword for the given example (Figure 1) is listed in
Table 3.
Table 2
Keyword-Position information of the email inFigure 1
Keyword appears
in the document
Position
in the document
bank
91
fund
103
account
109
transfer
124
fund
153
transfer
155
account
158