Finding Parts of Text - Natural Language Processing with Java - page 87

Java Reference

In-Depth Information

Using lemmatization

Lemmatization is supported by a number of NLP APIs. In this section, we will illustrate

how lemmatization can be performed using the StanfordCoreNLP and the

OpenNLPLemmatizer classes. The lemmatization process determines the lemma of a

word. A lemma can be thought of as the dictionary form of a word. For example, the

lemma of "was" is "be".

Using the StanfordLemmatizer class

We will use the StanfordCoreNLP class with a pipeline to demonstrate lemmatization.

We start by setting up the pipeline with four annotators including lemma as shown here:

StanfordCoreNLP pipeline;

Properties props = new Properties();

props.put("annotators", "tokenize, ssplit, pos, lemma");

pipeline = new StanfordCoreNLP(props);

These annotators are needed and are explained as follows:

Annotator

Operation to be Performed

tokenize Tokenization

Sentence splitting

ssplit

POS tagging

pos

lemma

Lemmatization

ner

NER

parse

Syntactic parsing

dcoref

Coreference resolution

A paragraph variable is used with the Annotation constructor and the annotate

method is then executed, as shown here:

Next Page

Natural Language Processing with Java

Search WWH ::

Custom Search

Home