Introduction to NLP - Natural Language Processing with Java

Java Reference

In-Depth Information

Extracting relationships

Relationship extraction identifies relationships that exist in text. For example, with the sen-

tence "The meaning and purpose of life is plain to see", we know that the topic of the sen-

tence is "The meaning and purpose of life". It is related to the last phrase that suggests that

it is "plain to see".

Humans can do a pretty good job at determining how things are related to each other, at

least at a high level. Determining deep relationships can be more difficult. Using a com-

puter to extract relationships can also be challenging. However, computers can process

large datasets to find relationships that would not be obvious to a human or that could not

be done in a reasonable period of time.

There are numerous relationships possible. These include relationships such as where

something is located, how two people are related to each other, what are the parts of a sys-

tem, and who is in charge. Relationship extraction is useful for a number of tasks including

building knowledge bases, performing analysis of trends, gathering intelligence, and per-

forming product searches. Finding relationships is sometimes called Text Analytics .

There are several techniques that we can use to perform relationship extractions. These are

covered in more detail in Chapter 7 , Using a Parser to Extract Relationships . Here, we will

illustrate one technique to identify relationships within a sentence using the Stanford NLP

StanfordCoreNLP class. This class supports a pipeline where annotators are specified

and applied to text. Annotators can be thought of as operations to be performed. When an

instance of the class is created, the annotators are added using a Properties object

found in the java.util package.

First, create an instance of the Properties class. Then assign the annotators as follows:

Properties properties = new Properties();

properties.put("annotators", "tokenize, ssplit, parse");

We used three annotators, which specify the operations to be performed. In this case, these

are the minimum required to parse the text. The first one, tokenize , will tokenize the

text. The ssplit annotator splits the tokens into sentences. The last annotator, parse ,

performs the syntactic analysis, parsing, of the text.

Next, create an instance of the StanfordCoreNLP class using the properties' reference

variable:

Search WWH ::

Custom Search

Home