Java Reference
In-Depth Information
Chapter 1. Introduction to NLP
Natural Language Processing ( NLP ) is a broad topic focused on the use of computers to
analyze natural languages. It addresses areas such as speech processing, relationship ex-
traction, document categorization, and summation of text. However, these types of analysis
are based on a set of fundamental techniques such as tokenization, sentence detection, clas-
sification, and extracting relationships. These basic techniques are the focus of this topic.
We will start with a detailed discussion of NLP, investigate why it is important, and identify
application areas.
There are many tools available that support NLP tasks. We will focus on the Java language
and how various Java Application Programmer Interfaces ( APIs ) support NLP. In this
chapter, we will briefly identify the major APIs, including Apache's OpenNLP, Stanford
NLP libraries, LingPipe, and GATE.
This is followed by a discussion of the basic NLP techniques illustrated in this topic. The
nature and use of these techniques is presented and illustrated using one of the NLP APIs.
Many of these techniques will use models. Models are similar to a set of rules that are used
to perform a task such as tokenizing text. They are typically represented by a class that is
instantiated from a file. We round off the chapter with a brief discussion on how data can
be prepared to support NLP tasks.
NLP is not easy. While some problems can be solved relatively easily, there are many oth-
ers that require the use of sophisticated techniques. We will strive to provide a foundation
for NLP processing so that you will be able to understand better which techniques are
available and applicable for a given problem.
NLP is a large and complex field. In this topic, we will only be able to address a small part
of it. We will focus on core NLP tasks that can be implemented using Java. Throughout this
topic, we will demonstrate a number of NLP techniques using both the Java SE SDK and
other libraries, such as OpenNLP and Stanford NLP. To use these libraries, there are specif-
ic API JAR files that need to be associated with the project in which they are being used. A
discussion of these libraries is found in the Survey of NLP tools section and contains
download links to the libraries. The examples in this topic were developed using NetBeans
8.0.2. These projects required the API JAR files to be added to the Libraries category of the
Projects Properties dialog box.
Search WWH ::




Custom Search