Database Reference
In-Depth Information
Figure 12-14. Additional text mining operators of interest.
Stemming : In text mining, stemming means finding terms that share a common root and
combining them to mean essentially the same thing. For example, 'America', 'American',
'Americans', are all like terms and effectively refer to the same thing. By stemming (you
can see there are a number of stemming operators using different algorithms for you to
choose from), RapidMiner can reduce all instances of these word variations to a common
form, such as 'Americ', or perhaps 'America', and have all instances represented in a single
attribute.
Generate n-Grams : In text mining, an n-gram is a phrase or combination of words that
may take on meaning that is different from, or greater than the meaning of each word
individually. When creating n-grams, the n is simply the maximum number of terms you
want RapidMiner to consider grouping together. Take for example the token 'death'. This
word by itself is strong, evoking strong emotion. But now consider the meaning, strength
and emotion if you were to add a Generate n-Grams operator to your model with a size of
2 (this is set in the parameters area of the n-gram operator). Depending on your input text,
you might find the token 'death_penalty'. This certainly has a more specific meaning and
Search WWH ::




Custom Search