Classifying Texts and Documents - Natural Language Processing with Java

Java Reference

In-Depth Information

classifier = DynamicLMClassifier.createNGramProcess(

categories, nGramSize);

As we did earlier, we will create a series of instances based on the contents found in the

training files. We will not detail the following code as it is very similar to that found in the

Training text using the Classified class section. The main difference is there are only two

categories to process:

String directory = "...";

File trainingDirectory = new File(directory,

"txt_sentoken");

for (int i = 0; i < categories.length; ++i) {

Classification classification =

new Classification(categories[i]);

File file = new File(trainingDirectory, categories[i]);

File[] trainingFiles = file.listFiles();

for (int j = 0; j < trainingFiles.length; ++j) {

try {

String review = Files.readFromFile(

trainingFiles[j], "ISO-8859-1");

Classified<CharSequence> classified =

new Classified<>(review, classification);

classifier.handle(classified);

} catch (IOException ex) {

ex.printStackTrace();

}

The model is now ready to be used. We will use the review for the movie Forrest Gump:

String review = "An overly sentimental film with a somewhat

"

+ "problematic message, but its sweetness and charm "

+ "are occasionally enough to approximate true depth "

+ "and grace. ";

We use the classify method to perform the actual work. It returns a Classifica-

tion instance whose bestCategory method returns the best category, as shown here:

Search WWH ::

Custom Search

Home