Java Reference
In-Depth Information
Using the Stanford pipeline
In this section, we will discuss the Stanford pipeline in more detail. Although we have used
it in several examples in this topic, we have not fully explored its capabilities. Having used
this pipeline before, you are now in a better position to understand how it can be used.
Upon reading this section, you will be able to better assess its capabilities and applicability
to your needs.
The
edu.stanford.nlp.pipeline
package holds the
StanfordCoreNLP
and
annotator
classes. The general approach uses the following code sequence where the
text
string is processed. The
Properties
class holds the annotation names as shown
here:
String text = "The robber took the cash and ran.";
Properties props = new Properties();
props.put("annotators",
"tokenize, ssplit, pos, lemma, ner, parse, dcoref");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
The
Annotation
class represents the text to be processed. The constructor, used in the
next code segment, takes the string and adds a
CoreAnnotations.TextAnnotation
instance to the
Annotation
object. The
StanfordCoreNLP
class'
annotate
method
will apply the annotations specified in the property list to the
Annotation
object:
Annotation annotation = new Annotation(text);
pipeline.annotate(annotation);
CoreMap
interface is the base interface for all annotable objects. It uses class objects for
keys. The
TextAnnotation
annotation type is a
CoreMap
key for the text. A
CoreMap
key is intended to be used with various types of annotations such as those
defined in the properties list. The value depends on the key type.
The hierarchy of classes and interfaces is depicted in the following diagram. It is a simpli-
fied version of the relationship between classes and interfaces as they relate to the the
pipeline. The horizontal lines represent interface implementations and the vertical lines rep-
resent inheritance between classes.