Information Technology Reference
In-Depth Information
represent the concept of the vast majority of Engineering Parameters, thus the VISAT
algorithm only extracts the candidate termsets containing two words.
The Steps of the VISAT Algorithm
The VISAT algorithm includes four steps: (1) split sentence into several Split-
PointSsegments (SPSs) and StringSegments (SSs), (2) reorganize some SPSs and SSs,
(3) generate termsets in each segment itself and (4) generate termsets which combines
two words belonging to two different segments.
In the first step, the VISAT algorithm takes the verb, the punctuation mark (,) and
the conjunction (and, but) as split point to split each sentence, and then each sentence
are split into several SplitPointSsegments (SPSs) and StringSegments (SSs). In Fig. 3,
each segment of words tagged by the underline is StringSegment (SS) and each seg-
ment of words pointed by the arrow is SplitPointSsegment (SPS). Each number
tagged below the underline is the index of SS, and each number tagged above the
arrow is the index of SPS. There must be one SS after each SPS, even if some SS are
empty set in the sentence.
Fig. 3. An example of the first step of the VISAT algorithm to split sentence
In the second step, the VISAT algorithm reorganizes some SPSs and SSs. It checks
each SPS containing the conjunction and the segments nearby the SPS to judge
whether these segments should be reorganized. Fig. 4 shows an example.
Fig. 4. An example of the second step of the VISAT algorithm to reorganize segments
In the third step, the VISAT algorithm generates termsets in each segment itself.
From the view of the grammar and the structure of sentence, a complete sentence
must have at least one verb and there are five types of the basic sentence structure. If
the verb element is excluded, the remaining elements are the subject, the object and
the complement. We take the subject and the object as the set of words which are
combined together to represent the noun concept. The set of words in the complement
concept should also be combined. The termsets should also be generated in each SPS
which contains more than one word.
In the fourth step, the VISAT algorithm generates termsets which combine two
words belonging to two different segments. It goes through every SS to check whether
the words belonging to the SS which should be combined with the words belonging to
other segments. There are four cases should be used to generate termsets with other
segments as shown in Fig. 5.
Search WWH ::




Custom Search