Java Reference
In-Depth Information
Using lemmatization
Lemmatization is supported by a number of NLP APIs. In this section, we will illustrate
how lemmatization can be performed using the StanfordCoreNLP and the
OpenNLPLemmatizer classes. The lemmatization process determines the lemma of a
word. A lemma can be thought of as the dictionary form of a word. For example, the
lemma of "was" is "be".
Using the StanfordLemmatizer class
We will use the StanfordCoreNLP class with a pipeline to demonstrate lemmatization.
We start by setting up the pipeline with four annotators including lemma as shown here:
StanfordCoreNLP pipeline;
Properties props = new Properties();
props.put("annotators", "tokenize, ssplit, pos, lemma");
pipeline = new StanfordCoreNLP(props);
These annotators are needed and are explained as follows:
Annotator
Operation to be Performed
tokenize Tokenization
Sentence splitting
ssplit
POS tagging
pos
lemma
Lemmatization
ner
NER
parse
Syntactic parsing
dcoref
Coreference resolution
A paragraph variable is used with the Annotation constructor and the annotate
method is then executed, as shown here:
Search WWH ::




Custom Search