Information Technology Reference
In-Depth Information
any two punctuations and extract possible title segments with a length longer
than four, i.e. this segment contains at least four consecutive words. For each
possible title segment, we compute the confidence score by checking the title
dictionary.
To enhance the FBA, we also create a plug-in mechanism to allow the use of
other resources to perform slot tagging. For example, through the use of pre-
collected dictionaries such as journal name lists from the web, we can easily
broaden our knowledge base. In this way, we can incorporate other techniques,
such as statistical models, to further strengthen the FBA.
3.3 Frame Matching
We propose an alignment-based matching algorithm that enables a single frame
to match multiple similar expressions with high accuracy. Each frame contains
a collection of slot relations and related bigram statistics. A slot can be a word,
phrase, semantic category, or another frame concept. Unlike normal template-
based approaches, we utilize slot relations as scoring criteria during matching.
During the matching procedure, we score possible candidate frames depending on
matched slots, slot relations and insertions/deletions/substitutions. Each exactly
matched slot gets a score of four. The score of an insertion is calculated by
gathering its left (resp. right) bigram statistics with its neighboring left (resp.
right) slots in the training set. The bigram frequency gives a way to assign
the insertion scores, which are truncated to fall in the ranges from -10 to 2.
Deletions are less common in RME and will be assigned a score from -10 to 0.
A substitution is either a partial match or a category match of the slot, which
is usually assigned a score of 1 or 2. The final score of a frame is the sum of all
the scores within this frame. In this way, we can capture most of the variations
of a certain concept using only a few frames, and the score can determine the
most likely match. A frame concept described above is more general than rules,
in that, different expressions of the same concept can likely be captured by one
frame. Such generality might slightly sacrifice precision, but tends to get a much
higher recall. Note that the number of frames adopted tends to be proportional
to the number of reference styles rather than the size of the training set.
3.4 Examples of Insertion, Deletion and Substitution for a Frame
Consider a frame involving three components V , I ,and P , (i.e. Volume, Issue,
and Page, respectively), which are arranged in this order from left to right. Sup-
pose we have a reference string “38, ,115-126”, in which V and P are identified,
but I is missing (a deletion). There may be various punctuations between V and
P (insertions). An insertion can be given a positive score if it tends to collocate
with its left or right matched components, such as punctuations in this case,
(otherwise, negative). A deletion can be harmful if slot I contains a key compo-
nent for the frame. Note that many such key components can be pre-specified
in the frame. Furthermore, in another example like “38, suppl 2, 115-126”, a
match for slot I can be found by partial matching for the word “supplement”
Search WWH ::




Custom Search