Frame-Based Approach for Reference Metadata Extraction - Technologies and Applications of Artificial Intelligence - page 156

Information Technology Reference

In-Depth Information

ReferenceFrame

S L OT

1: Literal Part

SLOT

1: Authors

SLOT

1: First Name

2: Middle Name

3: Last Name

PATTERN

[1]:[2]:[3]

2: Title

3: Journal

PATTERN

[1]:[2]:[3]

[2]:[1]:[3]

2: Number Part

SLOT

4: Volume

SLOT

1: Volume Prefix

[Volume]

[Vol]

2: Digits

RegExp

\d+

PATTERN

[1]:[2]

5: Issue

SLOT

1: Issue Prefix

*Supplement*

[No]

2: Digits

RegExp

\d+

PATTERN

[1]:[2]

6: Page

7: Year

PATTERN

[4]:[5]:[6]:[7]

PATTERN

[1]:[2]

Fig. 1. An illustration of the frame-slot representation of the RME domain knowledge

pre-collected dictionaries are used to tag author, title and journal as A , T ,and

J , respectively. A reference string is first tokenized by whitespace, and then the

dictionaries are used to assign single or multiple tags for each token. Subse-

quently, frequent trigram tags are examined to generate frames such as “AAT”,

“TTA” and “TTJ”. In addition, 40% of the titles in the training data are en-

closed by quotation marks, so they are used to designate the boundary of T and

J . Furthermore, over 60% of the year field exists between A and T , according to

previous analysis [5]. Thus, it is also included as an indicator of boundary.

Authors are usually either “F M L” or “L, F M”, in which “F”, “M”, and “L”

indicate first, middle, and last name, respectively. Most author names in references

would be written following a consistent style and abbreviation convention. Hence,

abbreviation patterns can be used to determine the end of the author field.

For the title, the length of the title in a normal reference string is often more

than three words, and few punctuations, such as commas or periods, would occur

within the title. In contrast, punctuations, especially commas, are commonly

used to separate author names. Therefore, we calculate the distance between

Next Page

Technologies and Applications of Artificial Intelligence

Search WWH ::

Custom Search

Home