Information Technology Reference
In-Depth Information
in this dataset not available for either FBA or CRFs whose title dictionary is
collected from general-purpose words in the WordNet. Thus, domain dictionaries
are required for FBA and CRFs to achieve better performance.
Finally, to validate the effectiveness of FBA in general, we created a “free
style set” by randomly selecting 1,500 journal reference strings from researchers
websites. Many of these researchers use their own favorite styles. As we can see
in Table 6, the performance of FBA remains stable and is better than that of
CRFs. These results indicate that the performance of machine learning models
could easily be affected by unseen journal styles. In contrast, FBA appears to
be more tolerable. Since this free style set is quite different from the training
set, and can be treated as individualized journal styles, the performance on this
dataset could indicate how a system would perform in case it is adopted as a
web service.
We also list some errors predicted by CRFs to analyze its deficiency. As ob-
served in Table 7, CRFs fails to recognize the author due to that only the field
label (“A”,“T” and so on) be provided in the training set. We believe that if
we spend lots of effort to label the fine-grained features such as “First”, “Last”,
“Generation”, etc, the performance of CRFs could be improved. In contrast,
FBA can easily incorporate such fine-grained feature without much effort.
All in all, experiment results showed that FBA performed equally well on
trained and various untrained test domains. It can reduce the weighted average
field error rate across all four test sets by around 70% (2.24% vs. 7.54%) when
compared to CRFs.
Tabl e 7. The examples for the prediction error of CRFs
Reference string
Predicted author by CRFs
Dupavcova, Jitka and Wets, Roger: Asymptotic behav-
ior of statistical...
Dupavcova, Jitka and Wets
E.D. Falkenberg and Pols, R. van der and Weide, Th.P.
van der, Under-standing process structure...
E.D. Falkenberg and Pols,
R. van der and Weide
Michael S. Kogan and Freeman L. Rawson, III: The
design of Operating System...
Michael S. Kogan and Free-
man L. Rawson
5 Conclusions
We proposed a frame-based approach (FBA) by taking reference metadata ex-
traction as a case study to show its merits. FBA is designed to compensate for
the shortcomings while retaining the strengths of traditional rule-based approach
in that, the fuzzy nature of frame matching can capture reasonable variations
in the input text to further increase the coverage of the frames. Our experiment
results indicate that the FBA is superior to other widely-used machine learning
(e.g., CRFs) and template-based (e.g., BibPro) methods.
There are three directions for future research. First, we plan to extend this
framework to other reference styles such as conference proceeding, book chap-
ter and technical report. Second, we are applying this flexible FBA to other
 
Search WWH ::




Custom Search