Java Reference
In-Depth Information
String directory = ".../demos";
File trainingDirectory = new File(directory
+ "/data/fourNewsGroups/4news-train");
In the training directory, there are four subdirectories whose names are listed in the cat-
egories array. In each subdirectory is a series of files with numeric names. These files
contain newsgroups ( http://qwone.com/~jason/20Newsgroups/ ) data that deal with that
directories, names.
The process of training the model involves using each file and category with the Dynam-
icLMClassifier class' handle method. The method will use the file to create a
training instance for the category and then augment the model with this instance. The pro-
cess uses nested for-loops.
The outer for-loop creates a File object using the directory's name and then applies the
list method against it. The list method returns a list of the files in the directory. The
names of these files are stored in the trainingFiles array, which will be used in the
inner loop:
for (int i = 0; i < categories.length; ++i) {
File classDir =
new File(trainingDirectory, categories[i]);
String[] trainingFiles = classDir.list();
// Inner for-loop
}
The inner for-loop, as shown next, will open each file and read the text from the file. The
Classification class represents a classification with a specified category. It is used
with the text to create a Classified instance. The DynamicLMClassifier class'
handle method updates the model with the new information:
for (int j = 0; j < trainingFiles.length; ++j) {
try {
File file = new File(classDir, trainingFiles[j]);
String text = Files.readFromFile(file,
"ISO-8859-1");
Classification classification =
new Classification(categories[i]);
Classified<CharSequence> classified =
Search WWH ::




Custom Search