Java Reference
In-Depth Information
Classifying text and documents
Classification is concerned with assigning labels to information found in text or documents.
These labels may or may not be known when the process occurs. When labels are known,
the process is called classification . When the labels are unknown, the process is called
clustering .
Also of interest in NLP is the process of categorization. This is the process of assigning
some text element into one of the several possible groups. For example, military aircraft
can be categorized as either fighter, bomber, surveillance, transport, or rescue.
Classifiers can be organized by the type of output they produce. This can be binary, which
results in a yes/no output. This type is often used to support spam filters. Other types will
result in multiple possible categories.
Classification is more of a process than many of the other NLP tasks. It involves the steps
that we will discuss in Understanding NLP models later in the chapter. Due to the length of
this process, we will not illustrate the process here. In Chapter 6 , Classifying Text and
Documents , we will investigate the classification process and provide a detailed example.
Search WWH ::




Custom Search