Java Reference
In-Depth Information
Using the Stanford API for NER
We will demonstrate the
CRFClassifier
class as used to perform NER. This class im-
plements what is known as a linear chain
Conditional Random Field
(
CRF
) sequence
model.
To demonstrate the use of the
CRFClassifier
class, we will start with a declaration of
the classifier file string, as shown here:
String model = getModelDir() +
"\\english.conll.4class.distsim.crf.ser.gz";
The classifier is then created using the model:
CRFClassifier<CoreLabel> classifier =
CRFClassifier.getClassifierNoExceptions(model);
The
classify
method takes a single string representing the text to be processed. To use
the
sentences
text, we need to convert it to a simple string:
String sentence = "";
for (String element : sentences) {
sentence += element;
}
The
classify
method is then applied to the text.
List<List<CoreLabel>> entityList =
classifier.classify(sentence);
A
List
instance of
List
instances of
CoreLabel
objects is returned. The object re-
turned is a list that contains another list. The contained list is a
List
instance of
CoreLa-
bel
objects. The
CoreLabel
class represents a word with additional information at-
tached to it. The "internal" list contains a list of these words. In the outer for-each statement
in the following code sequence, the reference variable,
internalList
, represents one
sentence of the text. In the inner for-each statement, each word in that inner list is dis-
played. The
word
method returns the word and the
get
method returns the type of the
word.