Java Reference
In-Depth Information
What is NLP?
A formal definition of NLP frequently includes wording to the effect that it is a field of
study using computer science, artificial intelligence, and formal linguistics concepts to ana-
lyze natural language. A less formal definition suggests that it is a set of tools used to de-
rive meaningful and useful information from natural language sources such as web pages
and text documents.
Meaningful and useful implies that it has some commercial value, though it is frequently
used for academic problems. This can readily be seen in its support of search engines. A
user query is processed using NLP techniques in order to generate a result page that a user
can use. Modern search engines have been very successful in this regard. NLP techniques
have also found use in automated help systems and in support of complex query systems as
typified by IBM's Watson project.
When we work with a language, the terms, syntax, and semantics, are frequently en-
countered. The syntax of a language refers to the rules that control a valid sentence struc-
ture. For example, a common sentence structure in English starts with a subject followed
by a verb and then an object such as "Tim hit the ball". We are not used to unusual sentence
order such as "Hit ball Tim". Although the rule of syntax for English is not as rigorous as
that for computer languages, we still expect a sentence to follow basic syntax rules.
The semantics of a sentence is its meaning. As English speakers, we understand the mean-
ing of the sentence "Tim hit the ball". However, English and other natural languages can be
ambiguous at times and a sentence's meaning may only be determined from its context. As
we will see, various machine learning techniques can be used to attempt to derive the
meaning of text.
As we progress with our discussions, we will introduce many linguistic terms that will help
us better understand natural languages and provide us with a common vocabulary to ex-
plain the various NLP techniques. We will see how the text can be split into individual ele-
ments and how these elements can be classified.
In general, these approaches are used to enhance applications, thus making them more
valuable to their users. The uses of NLP can range from relatively simple uses to those that
are pushing what is possible today. In this topic, we will show examples that illustrate
simple approaches, which may be all that is required for some problems, to the more ad-
vanced libraries and classes available to address sophisticated needs.
Search WWH ::




Custom Search