Evolving Explanatory Novel Patterns for Semantically-Based Text Mining - Natural Language Processing and Text Mining

Information Technology Reference

In-Depth Information

in which

|

H

|

denotes the length of the hypothesis H, that is, the number of

predicates.

Note that pairs of target concepts are provided by a domain experts so as to

guide the search process.

•

Structure ( How good is the structure of the rhetorical roles? ): measures how

much of the rules' structure is exhibited in the current hypothesis.

Since we have previous pre-processed information for bi-grams of roles, the struc-

ture can be computed by following a Markov chain [23] as follows:

Structure ( H )= Prob ( r 1 ) ∗ |H|

i =2

Prob ( r i

| r i− 1 )

where r i represents the i−th role of the hypothesis H, Prob ( r i | r i− 1 ) denotes the

conditional probability that role r i− 1 immediately precedes r i . Prob ( r i ) denotes

the probability that no role precedes r i , that is, it is at the beginning of the

structure (i.e., Prob ( r i

|< start > )).

<START>

0.28

0.09

0.49

0.12

1.0

0.08

conclusion

0.41

0.53

0.03

goal

object

0.06

0.05

0.23

0.56

method

0.54

0.16

0.35

Fig. 9.3. Markov Model for Roles Structure Learned from sampled technical doc-

uments

For example, part of a Markov chain of rhetorical roles learned by the model from

a specific technical domain can be seen in figure 9.3. Here it can be observed that

some structure tags are more frequent than others (i.e., the sequence of rhetorical

roles goal-method (0.54) is more likely than the sequence goal-conclusion (0.08)).

•

Cohesion ( How likely is a predicate action to be associated with some specific

rhetorical role? ): measures the degree of “connection” between rhetorical infor-

mation (i.e., roles) and predicate actions. The issue here is how likely (according

to the rules) some predicate relation P in the current hypothesis is to be associ-

ated with role r . Formally, cohesion for hypothesis H is expressed as:

cohesion(H) = r i ,P i ∈H

Prob ( P i |r i )

|H|

where Prob ( P i | r i ) states the conditional probability of the predicate P i given

the rhetorical role r i .

Natural Language Processing and Text Mining

Search WWH ::

Custom Search

Home