Information Technology Reference
In-Depth Information
Table 2. Correlation values observed in the K-fold cross validation
Experiment
ρ ( dist t ,diff )
ρ ( dist h ,diff ) ρ ( dist avg ,diff ) ρ ( dist agg ,diff )
All models 0.70 0.61 0.60 — 0.34 0.58
0.74
0.65
0.77
Test 1
0.79 0.76 0.75 — 0.42 0.60
0.79
0.69
0.79
Test 2
0.64 0.56 0.56 — 0.43 0.62
0.68
0.70
0.70
Test 3
0.68 0.58 0.58 — 0.53 0.64
0.68
0.72
0.71
Test 4
0.61 0.47 0.45 — 0.20 0.48
0.70
0.56
0.52
Average 1 4
0.68 0.59 0.58 — 0.39 0.59
0.71
0.67
0.68
distances in columns 1-6. The distance measure dist agg
is evaluated according
to Equation 3. Vector
used in dist agg is obtained using linear regression as
described in the previous section. Rows of Table 2 correspond to experiments.
The first row describes the study of the whole model collection. Rows 2-5 de-
scribe the results of 4 tests along the K-fold cross validation we explained earlier,
while the last row provides the average correlations observed in the 4 separate
tests.
The correlation values that are presented in Table 2 are all significant using
a confidence level of 99%, i.e., all p values are lower than 0.01. However, no
statistically significant results were obtained for the distance in the homogeneous
vector space that corresponds to Data objects . Overall, the presented correlation
values range around 0.7. This level is generally considered to indicate a strong
correlation [11,12], particularly in situations where human decision making is
involved. Therefore, we can speak of a strong relation between the dist and diff
measures.
Among the distance measures in homogeneous spaces, one can point out the
distance in the Role space that overall displays the highest correlation values for
the different studies (0.61-0.79). In contrast, correlation values for Label are the
lowest (0.20-0.53). Another observation is that distances taking into account
multiple activity property types tend to have higher correlations. From these,
dist agg outperforms all other distance measures with a value arriving at 0.77
when all models are considered. For the average values of the K-fold cross vali-
dations, however, dist h , dist avg ,and dist agg demonstrate a similar performance,
with correlation values of 0.71, 0.67, and 0.68 respectively. This observation
can be explained by the fact that dist agg
W
—the
abstraction fingerprint of a particular model set. Thus, the distance measure
dist agg “trained” on one model set may never excel dist avg , once the set of mod-
els is changed. Tests 1-4 support this argumentation. Note that this result does
not restrict the applicability of the approach: in a real world setting, the goal
is to transfer the abstraction style from one model set to another. The average
is parameterized by vector
W
Search WWH ::




Custom Search