Java Reference
In-Depth Information
Extracting relationships
Relationship extraction identifies relationships that exist in text. For example, with the sen-
tence "The meaning and purpose of life is plain to see", we know that the topic of the sen-
tence is "The meaning and purpose of life". It is related to the last phrase that suggests that
it is "plain to see".
Humans can do a pretty good job at determining how things are related to each other, at
least at a high level. Determining deep relationships can be more difficult. Using a com-
puter to extract relationships can also be challenging. However, computers can process
large datasets to find relationships that would not be obvious to a human or that could not
be done in a reasonable period of time.
There are numerous relationships possible. These include relationships such as where
something is located, how two people are related to each other, what are the parts of a sys-
tem, and who is in charge. Relationship extraction is useful for a number of tasks including
building knowledge bases, performing analysis of trends, gathering intelligence, and per-
forming product searches. Finding relationships is sometimes called Text Analytics .
There are several techniques that we can use to perform relationship extractions. These are
covered in more detail in Chapter 7 , Using a Parser to Extract Relationships . Here, we will
illustrate one technique to identify relationships within a sentence using the Stanford NLP
StanfordCoreNLP class. This class supports a pipeline where annotators are specified
and applied to text. Annotators can be thought of as operations to be performed. When an
instance of the class is created, the annotators are added using a Properties object
found in the java.util package.
First, create an instance of the Properties class. Then assign the annotators as follows:
Properties properties = new Properties();
properties.put("annotators", "tokenize, ssplit, parse");
We used three annotators, which specify the operations to be performed. In this case, these
are the minimum required to parse the text. The first one, tokenize , will tokenize the
text. The ssplit annotator splits the tokens into sentences. The last annotator, parse ,
performs the syntactic analysis, parsing, of the text.
Next, create an instance of the StanfordCoreNLP class using the properties' reference
variable:
Search WWH ::




Custom Search