Classifying Texts and Documents - Natural Language Processing with Java

Java Reference

In-Depth Information

Text classifying techniques

Classification is concerned with taking a specific document and determining if it fits into

one of several other document groups. There are two basic techniques for classifying text:

• Rule-based

• Supervised Machine Learning

Rule-based classification uses a combination of words and other attributes organized

around expert crafted rules. These can be very effective but creating them is a time-con-

suming process.

Supervised Machine Learning ( SML ) takes a collection of annotated training documents

to create a model. The model is normally called the classifier . There are many different

machine learning techniques including Naive Bayes , Support-Vector Machine ( SVM ),

and k-nearest neighbor .

We are not concerned with how these approaches work but the interested reader will find

innumerable sources that expand upon these and other techniques.

Search WWH ::

Custom Search

Home