Java Reference
In-Depth Information
Text classifying techniques
Classification is concerned with taking a specific document and determining if it fits into
one of several other document groups. There are two basic techniques for classifying text:
• Rule-based
• Supervised Machine Learning
Rule-based classification uses a combination of words and other attributes organized
around expert crafted rules. These can be very effective but creating them is a time-con-
suming process.
Supervised Machine Learning
(
SML
) takes a collection of annotated training documents
to create a model. The model is normally called the
classifier
. There are many different
machine learning techniques including
Naive Bayes
,
Support-Vector Machine
(
SVM
),
and
k-nearest neighbor
.
We are not concerned with how these approaches work but the interested reader will find
innumerable sources that expand upon these and other techniques.