Information Technology Reference
In-Depth Information
Fig. 2. The flowchart of the CFF process
The Processes for Generating TFIDF Type Features
For the inputs of the all sentences, the Bag-of-words process calculates TFIDF value
of each word appeared in all sentences of the document, and then generates the
TFIDF vector outputs for training and testing documents. For the inputs of the strong
sentences, the sentences which the count of relating important set is equal or larger
than one are extracted throughout the Bag-of-words process. The TFIDF value of
each words appeared in extracted sentences of the document is calculated, and then
the STFIDF vector outputs for training and testing documents are generated.
The Processes for Generating Termset Type Features
We proposed an algorithm named Verb Including Split and Associate Termsets
(VISAT). It is a very important process included in CFF process, and we will illu-
strate this in detail later. We discovered that if the sentence relative to more important
word sets, it is taken as more important sentence. Therefore, for training documents,
only the sentences which the count of relating important set is larger than one in the
strong sentences are needed to perform POS tagging to get their part-of-speech infor-
mation. For testing documents, all sentences of testing documents are needed to per-
form POS tagging because the labels of Engineering Parameters of the document are
unknown.
We use the VISAT algorithm to extract the candidate termsets in the CFF Process.
According to our observation, the termsets containing two words are strong enough to
Search WWH ::




Custom Search