Java Reference
In-Depth Information
dog A pet dog taking part in Christmas traditions ...
dog The majority of contemporary people with dogs describe
their ...
dog Another study of dogs' roles in families showed many
dogs have ...
dog According to statistics published by the American Pet
Products ...
dog The latest study using Magnetic resonance imaging (MRI)
...
cat Cats are common pets in Europe and North America, and
their ...
cat Although cat ownership has commonly been associated ...
cat The concept of a cat breed appeared in Britain during
...
cat Cats come in a variety of colors and patterns. These
are physical ...
cat A natural behavior in cats is to hook their front claws
periodically ...
cat Although scratching can serve cats to keep their claws
from growing ...
When creating training data, it is important to use a large enough sample size. The data we
used is not sufficient for some analysis. However, as we will see, it does a pretty good job
of identifying the categories correctly.
The DoccatModel class supports categorization and classification of text. A model is
trained using the train method based on annotated text. The train method uses a
string denoting the language and an ObjectStream<DocumentSample> instance
holding the training data. The DocumentSample instance holds the annotated text and
its category.
In the following example, the en-animal.train file is used to train the model. Its in-
put stream is used to create a PlainTextByLineStream instance, which is then con-
verted to an ObjectStream<DocumentSample> instance. The train method is
then applied. The code is enclosed in a try-with-resources block to handle exceptions. We
also created an output stream that we will use to persist the model:
DoccatModel model = null;
try (InputStream dataIn =
Search WWH ::




Custom Search