Java Reference
In-Depth Information
if (stopWords.contains(tokens.get(i))) {
tokens.remove(i);
}
}
return (String[]) tokens.toArray(new
String[tokens.size()]);
}
The following sequence illustrates how StopWords can be used. First, we declare an in-
stance of the StopWords class using the default constructor. The OpenNLP Sim-
pleTokenizer class is declared and the sample text is defined, as shown here:
StopWords stopWords = new StopWords();
SimpleTokenizer simpleTokenizer = SimpleTokenizer.INSTANCE;
paragraph = "A simple approach is to create a class "
+ "to hold and remove stopwords.";
The sample text is tokenized and then passed to the removeStopWords method. The
new list is then displayed:
String tokens[] = simpleTokenizer.tokenize(paragraph);
String list[] = stopWords.removeStopWords(tokens);
for (String word : list) {
System.out.println(word);
}
When executed, we get the following output. The " A " is not removed because it is upper-
case and the class does not perform case conversion:
A
simple
approach
create
class
hold
remove
stopwords
.
Search WWH ::




Custom Search