Information Technology Reference
In-Depth Information
interactions. By contrast, our method analyzes the semantics and content (i.e., PPI
patterns) of text to identify protein-protein interactions. Hence, our performance is
superior to that of them. It is noteworthy that syntax tree-based kernel methods
oftentimes are just on par with the co-occurrence approach in terms of F1-measure.
On the very small LLL, their results practically coincide with co-occurrence. The
rich-feature-based and Cosine also outperform SPT, AkanePPI and syntax tree-based
kernel methods as it incorporates dependency features to distinguish protein-protein
interactions. Although Cosine can accomplish higher performance by further
considering term weighting, it is difficult to represent word relations. By contrast, our
method can extract word semantics, and generate PPI patterns that can capture long
distance relations among them. Consequently, we can achieve a better outcome than
other methods.
To summarize, the proposed interaction pattern tree kernel approach successfully
integrates the syntactic and semantic information in text to identify protein-protein
interactions. Hence, it achieves the best performance among the compared methods,
as shown in Table 1.
Table 1. The interaction extraction performance of the compared methods
System
LLL
IEPA
HPRD50
Macro-
average
Precision, Recall, F1-measure (%)
SPT
56.4 / 96.1 / 69.6
55.5 / 28.8 / 37.1
46.2 / 13.4 / 20.8
52.7 / 46.1 / 42.5
AkanePPI [19]
76.7 / 40.2 / 52.8
66.2 / 51.3 / 57.8
52.0 / 55.8 / 53.8
65.0 / 49.1 / 54.8
co-occ. [1]
55.9 / 100. / 70.3
40.8 / 100. / 57.6
38.9 / 100. / 55.4
45.2 / 100. / 61.1
PT [13]
56.2 / 97.3 / 69.3
63.1 / 66.3 / 63.8
54.9 / 56.7 / 52.4
58.1 / 73.4 / 61.8
SST [3]
55.9 / 100. / 70.3
54.8 / 76.9 / 63.4
48.1 / 63.8 / 52.2
52.9 / 80.2 / 62.0
ST [17]
55.9 / 100. / 70.3
59.4 / 75.6 / 65.9
49.7 / 67.8 / 54.5
55.0 / 81.1 / 63.6
SpT [9]
55.9 / 100. / 70.3
54.5 / 81.8 / 64.7
49.3 / 71.7 / 56.4
53.2 / 84.5 / 63.8
rich-feature-based [16]
72.0 / 73.0 / 73.0
64.0 / 70.0 / 67.0
60.0 / 51.0 / 55.0
65.3 / 64.7 / 65.0
Cosine [6]
70.2 / 81.7 / 73.8
61.3 / 68.4 / 64.1
59.0 / 67.2 / 61.2
63.5 / 72.4 / 66.4
Our method
59.9 / 94.4 / 71.6
52.2 / 88.1 / 65.2
59.3 / 83.0 / 67.3 57.1 / 88.5 / 68.0
6
Concluding Remarks
Automated extraction of protein-protein interactions is an important and widely
studied task in biomedical text mining. To this end, we proposed an interaction
pattern generation approach for acquiring PPI patterns. We also developed a method
that combines the shortest path-enclosed tree structure with the generated PPI patterns
to analyze the syntactic, semantic, and content information in text. It then exploits the
derived information to identify protein-protein interactions in biomedical literatures.
Our experiment results demonstrate that the proposed method is effective and also
outperforms well-known PPI extraction methods.
In the future, we will investigate the syntactic dependency tree in text to
incorporate further syntactic and semantic information into the interactive pattern tree
structures. We will also utilize information extraction algorithms to extract interaction
tuples from positive instances and construct an interaction network of proteins.
 
Search WWH ::




Custom Search