Frame-Based Approach for Reference Metadata Extraction - Technologies and Applications of Artificial Intelligence

Information Technology Reference

In-Depth Information

A Frame-Based Approach for Reference

Metadata Extraction

Yu-Lun Hsieh 1 , Shih-Hung Liu 1 , Ting-Hao Yang 1 , Yu-Hsuan Chen 1 ,

Yung-Chun Chang 1 ,GladysHsieh 1 , Cheng-Wei Shih 1 , Chun-Hung Lu 2 ,

and Wen-Lian Hsu 1

1 Institute of Information Science, Academia Sinica, Taipei, Taiwan

{ morphe,journey,tinghaoyang,smallright,changyc,gladys,

dapi,hsu } @iis.sinica.edu.tw

2 Innovative Digitech-Enabled Applications & Services Institute, III, Taiwan

enricoghlu@iii.org.tw

Abstract. In this paper, we propose a novel frame-based approach

(FBA) and use reference metadata extraction as a case study to demon-

strate its advantages. The main contributions of this research are

three-fold. First, the new frame matching algorithm, based on sequence

alignment, can compensate for the shortcomings of traditional rule-based

approach, in which rule matching lacks flexibility and generality. Second,

an approximate matching is adopted for capturing reasonable abbrevia-

tions or errors in the input reference string to further increase the cov-

erage of the frames. Third, experiments conducted on extensive datasets

show that the same knowledge framework performed equally well on var-

ious untrained domains. Comparing to a widely-used machine learning

method, Conditional Random Fields (CRFs), the FBA can drastically

reduce the average field error rate across all four independent test sets

by 70% (2.24% vs. 7.54%).

Keywords: Reference Metadata Extraction, Knowledge representation,

Frame-based approach.

1 Introduction

In natural language processing (NLP), an important task is to recognize vari-

ous linguistic expressions. Many such expressions can be represented as rules or

templates. These templates are matched by computer to identify those linguis-

tic objects in text. However, in the real world, there always seem to be many

exceptions or variations not covered by rules or templates. A typical approach

to cope with this situation is either to produce more templates or to relax the

constraints of the templates (e.g., by inserting optionals or wild cards). But the

former produces many case-by-case templates that could create more conflicts;

and the latter could lead to lots of false positives, namely, matched but unde-

sirable linguistic expressions. Thus, the inflexibility of rule-based systems has

troubled the NLP as well as the artificial intelligence (AI) communities for years

Search WWH ::

Custom Search

Home