Information Technology Reference
In-Depth Information
name, hospital name, identity number, carbon copy, merged sections such as “assess-
ment and plan”, and so on.
In our experiment, the set1 data was used as the training set for the CRF model; the
set2 data set was used to develop features. Finally, the testing set was used to test the
performance of the developed model trained on set1.
3.2
Evaluation Schemes
The standard recall, precision and F-measure metrics (RPF) were used to evaluate the
performance of the developed CRF model and its comparison with a dictionary-based
method.
the
number
of
correctly
recognized
Section
Heading
chunks
Precision
=
the
number
of
recognized
Section
Heading
chunks
the
number
of
correctly
recognized
Section
Heading
chunks
Recall
=
the
number
of
true
Section
Heading
chunks
This work defines a correctly recognized section-heading chunk (a true positive
case) as a case in which the text span of the recognized section heading is completely
matched with the span of the manually annotated heading. Therefore, a false positive
(FP) case includes any unmatched section headings generated by the computer.
3.3
Experiment Configurations and Results
This work developed a dictionary-based method based on the maximum matching
algorithm as a baseline system to compare its performance with that of the CRF-based
method. Three dictionaries were used by the dictionary-based methods: the SecTag
section header terminology (the “SecTag” configuration), the section heading names
collected from the training 1 set (the “Training” configuration), and the union of
the two dictionaries (the “SecTag+Training” configuration). In addition, to study the
effect of the proposed layout features, this work trained two CRF models; one is the
model with all proposed feature sets (CRF-based+Layout Features), and the other
excluded the layout features (CRF-based).
The experimental results of the five configurations are shown in Table 2. The best
recall on both datasets was achieved by the dictionary-based method without using
the section names from SecTag. However, the CRF-based method noticeably outper-
forms the dictionary-based methods in terms of P and F-scores.
On the test dataset, the best CRF-configuration achieved a P-score of 0.955, which
outperforms the best configuration of the dictionary-based method by 0.496. With the
layout feature, both precision and recall of the CRF-based method can be improved
by 0.112 and 0.174, respectively. CRF-based methods with layout features achieved
the best performance in terms of PRF. Similar observations can also be examined on
the development set.
1 When testing on the test set, the section names from set 2 were also added into the dictionary.
Search WWH ::




Custom Search