Databases Reference
In-Depth Information
to hotel reservations and the relevant matches (illustrated by the interschema lines)
generated by two matching tools, COMA
[ Aumueller et al. 2005 ] and Similarity
Flooding [ Melnik et al. 2002 ], denoted as SF in short. COMA
CC
has discovered
9 matches, while SF has discovered 7 . Note that for SF, the matches between the
root elements of the schemas are not considered.
CC
[Precision] The precision calculates the proportion of relevant matches discovered
by the matching tool with respect to all those discovered. Using the notation of
Tab le 9.1 , the precision is defined as
TP
Precision
D
TP
C
FP
An 100 % precision means that all the matches discovered by the tool are relevant.
In the particular example of Fig. 9.8 , both tools achieve a 100 % precision:
9
7
Precision COMA CC D
0 D
100 %
Precision SF D
0 D
100 %
9
C
7
C
[Recall] Recall is another broadly used metric. It computes the proportion of
matches discovered by the tool with respect to all the relevant matches. It is defined
by the formula
TP
Recall
D
TP
C
FN
A 100 % recall means that all relevant matches have been found by the tool. For
the scenario of Fig. 9.8 ,COMA
has discovered 9 matches but missed 4 rele-
vant matches. These missed matches are the false negatives. SF, on the other hand,
discovered 7 relevant matches out of the 13 . These results give the following recall
values:
CC
9
7
Recall COMA CC D
4 D
69 %
D
6 D
54 %
Recall SF
9
C
7
C
[F-measure] F-measure is a trade-off between precision and recall. It is defined as
follows:
2 C
1/
Precision
Recall
f
measure .ˇ/
D
2
Precision /
C
Recall
The ˇ parameter regulates the respective influence of precision and recall. It is often
set to 1 to give the same weight to these two evaluation measures. Back to our
running example, using a ˇ equal to 1 , the f-measure values obtained for COMA
CC
and SF are, respectively, as follows:
 
Search WWH ::




Custom Search