Java Reference
In-Depth Information
Text classifying techniques
Classification is concerned with taking a specific document and determining if it fits into
one of several other document groups. There are two basic techniques for classifying text:
• Rule-based
• Supervised Machine Learning
Rule-based classification uses a combination of words and other attributes organized
around expert crafted rules. These can be very effective but creating them is a time-con-
suming process.
Supervised Machine Learning ( SML ) takes a collection of annotated training documents
to create a model. The model is normally called the classifier . There are many different
machine learning techniques including Naive Bayes , Support-Vector Machine ( SVM ),
and k-nearest neighbor .
We are not concerned with how these approaches work but the interested reader will find
innumerable sources that expand upon these and other techniques.
Search WWH ::




Custom Search