Java Reference
In-Depth Information
How classification is used
Classifying text is used for a number of purposes:
• Spam detection
• Authorship attribution
• Sentiment analysis
• Age and gender identification
• Determining the subject of a document
• Language identification
Spamming is an unfortunate reality for most e-mail users. If an e-mail can be classified as
spam, then it can be moved to a spam folder. A text message can be analyzed and certain
attributes can be used to designate the e-mail as spam. These attributes can include mis-
spellings, lack of an appropriate e-mail address for recipients, and a non-standard URL.
Classification has been used to determine the authorship of documents. This has been per-
formed for historical documents such as for The Federalist Papers and for the topic
Primary Colors where the authors have been identified.
Sentiment analysis is a technique that determines the attitude of text. Movie reviews have
been a popular domain but it can be used for almost any product review. This helps com-
panies better assess how their product is perceived. Often, a negative or positive attribute is
assigned to text. Sentiment analysis is also called opinion extraction/mining and subjectiv-
ity analysis. Consumer confidence and the performance of a stock market can be predicted
from Twitter feeds and other sources.
Classification can be used to determine the age and gender of a text's author and to provide
more insight into its author. Frequently, the number of pronouns, determiners, and noun
phrases are used to identify the gender of a writer. Females tend to use more pronouns and
males tend to use more determiners.
Determining the subject of text is useful when we need to organize a large number of docu-
ments. Search engines are very much concerned with this activity but it has also been used
simply to place documents in different categories such as used with tag clouds. A tag cloud
is a group of words reflecting the relative frequency of occurrence of each word.
The following image is an example of a tag cloud generated by IBM Word Cloud Generat-
or ( http://www.softpedia.com/get/Office-tools/Other-Office-Tools/IBM-Word-Cloud-Gen-
Search WWH ::




Custom Search