Java Reference
In-Depth Information
Parameter
Usage
int
Specifies how many times a feature is processed
int
The number of iterations used to train the maxent model
In the example that follows, we start by defining a BufferedOutputStream object
that will be used to store the new model. Several of the methods used in the example will
generate exceptions, which are handled in catch blocks:
BufferedOutputStream modelOutputStream = null;
try {
} catch (UnsupportedEncodingException ex) {
// Handle the exception
} catch (IOException ex) {
// Handle the exception
}
An instance of an ObjectStream class is created using the PlainTex-
tByLineStream class. This uses the training file and the character encoding scheme as
its constructor arguments. This is used to create a second ObjectStream instance of
the TokenSample objects. These objects are text with token span information included:
ObjectStream<String> lineStream = new PlainTextByLineStream(
new FileInputStream("training-data.train"), "UTF-8");
ObjectStream<TokenSample> sampleStream =
new TokenSampleStream(lineStream);
The train method can now be used as shown in the following code. English is specified
as the language. Alphanumeric information is ignored. The feature and iteration values are
set to 5 and 100 respectively.
TokenizerModel model = TokenizerME.train(
"en", sampleStream, true, 5, 100);
The parameters of the train method are given in detail in the following table:
Search WWH ::




Custom Search