Databases Reference
In-Depth Information
information on the quality of the schema matching process. We justify the decision to use top- K
matchings for statistically monotonic schema matchers, discussed in Section 3.4.1 , on the grounds
that monotonicity ensures that the top- K matchings are sufficiently close to the exact matching.
5.2
ALGORITHMS FOR IDENTIFYING TOP- K MATCHINGS
This section introduces three algorithms for identifying top- K matchings. Given two schemata
with n 1 and n 2 concepts, the first two algorithms efficiently identify the second-best matching. We
then show that these algorithms do not scale for a general K . Therefore, we suggest a polynomial
algorithm for identifying top- K matchings.
Given two schemata with n 1 and n 2 attributes, let G = (X,Y,E) be an undirected bipartite
graph with nodes representing attributes of the two schemata ( V = X Y ) and edges representing
the degree of similarity between attributes. We denote by A best the best (complexity-wise) algorithm
for finding a maximum weight matching in a bipartite graph and by C (A best ) the complexity of the
algorithm A best . The best known deterministic algorithm for finding maximum weight matching has
a complexity of C (A best )
O n 3 [ Korte and Vygen , 2002 ], where in our case, n
max (n 1 ,n 2 ) .
It is worth noting that identifying the maximum weighted matching in a general graph appears to
be one of the hardest combinatorial optimization problems that can be solved in polynomial time.
To find the minimum weighted path for all pairs of vertices in G we shall utilize the Floyd-Warshall
=
=
algorithm with a computational complexity of O n 3 (see [ Cormen et al. , 1990 ], pp. 629-634).
5.2.1
FINDING THE TOP- 2 MATCHINGS USING A best
Let σ 1
be the maximum weight matching and let σ 2
be the second-best matching in G . Observe
σ 1
σ 2 . Otherwise, σ 1
σ 2 , and since edge
that there exists an edge e
E such that e
yet e/
weights are positive, σ 2 > σ 1 , contradicting the optimality of σ 1 .Let G e denote a graph
obtained from G by eliminating the edge e , for any edge e E . Clearly, for e σ 1 given above, the
matching σ 2 is also a matching in G e . Moreover, the matching σ 2 is a maximum weight matching in
G e (otherwise, the existence of a matching σ in G e of weight greater than that of σ 2 would contradict
the fact that σ 2 is the second-best matching, since σ is also different from σ 1 ). We conclude that
any maximum weight matching in G e is a second-best matching in G since it differs from σ 1 (in
particular, it does not contain the edge e ), and its weight equals that of σ 2 , which is the second-best
matching.
By the argument above, it follows that if one could correctly guess an edge e
σ 2 , then
the second-best matching could be computed by applying A best to the graph G e . By enumerating
over all possible values of e , which are all the edges of σ 1 , we exhaustively search for σ 2 .Let σ
σ 1 \
e
e σ 1 . Since all of these candidate solutions are
different from σ 1 and at least one of them is the second-best matching, the second-best matching
can be found by picking a candidate solution of maximum weight. This leads to Algorithm 2 .
be a maximum weight mapping in G e , for each
Search WWH ::




Custom Search