Database Reference
In-Depth Information
evokes different and even stronger emotions than just the token 'death'. What if we
increased the n-gram size to 3? We might find a token 'death_penalty_execution'. Again,
more specific meaning and perhaps stronger emotion is attached. Understand that these
example gram tokens would only be created by RapidMiner if the two or three words in
each of them were found together, and in close proximity to one another in the input text.
Generating grams can be an excellent way to bring a more granular analysis to your text
mining activities.
Replace Tokens : This is similar to replacing missing or inconsistent values in more
structured data. This operator can come in handy once you've tokenized your text input.
Suppose for example that you had the tokens 'nation', 'country', and 'homeland' in your
data set but you wanted to treat all of them as one token. You could use this operator to
change both 'country' and 'homeland' to 'nation', and all instances of any of the three
terms (or their stems if you also use stemming) would subsequently be combined into a
single token.
These are a just a few of the other operators in the Text Processing area that can be nice additions
to a text mining model. There are many others, and you may experiment with these at your leisure.
For now though, we will proceed to…
MODELING
Click the blue up arrow to move from your sub-process back to your main process window.
Figure 12-15. The 'Return to Parent Operator' arrow (indicated by the black arrow).
 
Search WWH ::




Custom Search