Database Reference
In-Depth Information
Figure 12-9. A view inside the sub-process of our Process Documents operator.
10) Note that the blue up arrow in the process toolbar is now illuminated, where previously it
has been grayed out. This will allow us to return to our main process, once we have
constructed our sub-process . Within the sub-process though, there are a few things we
need to do, and a couple we can choose to do, in order to mine our text. Use the search
field in the Operators tab to locate an operator called Tokenize . It is under the Text
Processing menu in the Tokenization folder. When mining text, the words in the text must
be grouped together and counted. Without some numeric structure, the computer cannot
assess the meaning of the words. The Tokenize operator performs this function for us.
Drag it into the sub-process window (labeled 'Vector Creation' in the upper left hand
corner). The doc ports from the left hand side of the screen to the operator, and from the
operator to the right hand side of the screen, should all be connected by splines, as
illustrated in Figure 12-10.
Search WWH ::




Custom Search