Frame-Based Approach for Reference Metadata Extraction - Technologies and Applications of Artificial Intelligence

Information Technology Reference

In-Depth Information

in this dataset not available for either FBA or CRFs whose title dictionary is

collected from general-purpose words in the WordNet. Thus, domain dictionaries

are required for FBA and CRFs to achieve better performance.

Finally, to validate the effectiveness of FBA in general, we created a “free

style set” by randomly selecting 1,500 journal reference strings from researchers

websites. Many of these researchers use their own favorite styles. As we can see

in Table 6, the performance of FBA remains stable and is better than that of

CRFs. These results indicate that the performance of machine learning models

could easily be affected by unseen journal styles. In contrast, FBA appears to

be more tolerable. Since this free style set is quite different from the training

set, and can be treated as individualized journal styles, the performance on this

dataset could indicate how a system would perform in case it is adopted as a

web service.

We also list some errors predicted by CRFs to analyze its deficiency. As ob-

served in Table 7, CRFs fails to recognize the author due to that only the field

label (“A”,“T” and so on) be provided in the training set. We believe that if

we spend lots of effort to label the fine-grained features such as “First”, “Last”,

“Generation”, etc, the performance of CRFs could be improved. In contrast,

FBA can easily incorporate such fine-grained feature without much effort.

All in all, experiment results showed that FBA performed equally well on

trained and various untrained test domains. It can reduce the weighted average

field error rate across all four test sets by around 70% (2.24% vs. 7.54%) when

compared to CRFs.

Tabl e 7. The examples for the prediction error of CRFs

Reference string

Predicted author by CRFs

Dupavcova, Jitka and Wets, Roger: Asymptotic behav-

ior of statistical...

Dupavcova, Jitka and Wets

E.D. Falkenberg and Pols, R. van der and Weide, Th.P.

van der, Under-standing process structure...

E.D. Falkenberg and Pols,

R. van der and Weide

Michael S. Kogan and Freeman L. Rawson, III: The

design of Operating System...

Michael S. Kogan and Free-

man L. Rawson

5 Conclusions

We proposed a frame-based approach (FBA) by taking reference metadata ex-

traction as a case study to show its merits. FBA is designed to compensate for

the shortcomings while retaining the strengths of traditional rule-based approach

in that, the fuzzy nature of frame matching can capture reasonable variations

in the input text to further increase the coverage of the frames. Our experiment

results indicate that the FBA is superior to other widely-used machine learning

(e.g., CRFs) and template-based (e.g., BibPro) methods.

There are three directions for future research. First, we plan to extend this

framework to other reference styles such as conference proceeding, book chap-

ter and technical report. Second, we are applying this flexible FBA to other

Technologies and Applications of Artificial Intelligence

Search WWH ::

Custom Search

Home