Top-K Schema Matchings - Uncertain Schema Matching

Databases Reference

In-Depth Information

information on the quality of the schema matching process. We justify the decision to use top- K

matchings for statistically monotonic schema matchers, discussed in Section 3.4.1 , on the grounds

that monotonicity ensures that the top- K matchings are sufficiently close to the exact matching.

5.2

ALGORITHMS FOR IDENTIFYING TOP- K MATCHINGS

This section introduces three algorithms for identifying top- K matchings. Given two schemata

with n 1 and n 2 concepts, the first two algorithms efficiently identify the second-best matching. We

then show that these algorithms do not scale for a general K . Therefore, we suggest a polynomial

algorithm for identifying top- K matchings.

Given two schemata with n 1 and n 2 attributes, let G = (X,Y,E) be an undirected bipartite

graph with nodes representing attributes of the two schemata ( V = X ∪ Y ) and edges representing

the degree of similarity between attributes. We denote by A best the best (complexity-wise) algorithm

for finding a maximum weight matching in a bipartite graph and by C (A best ) the complexity of the

algorithm A best . The best known deterministic algorithm for finding maximum weight matching has

a complexity of C (A best )

O n 3 [ Korte and Vygen , 2002 ], where in our case, n

max (n 1 ,n 2 ) .

It is worth noting that identifying the maximum weighted matching in a general graph appears to

be one of the hardest combinatorial optimization problems that can be solved in polynomial time.

To find the minimum weighted path for all pairs of vertices in G we shall utilize the Floyd-Warshall

algorithm with a computational complexity of O n 3 (see [ Cormen et al. , 1990 ], pp. 629-634).

5.2.1

FINDING THE TOP- 2 MATCHINGS USING A best

Let σ 1

be the maximum weight matching and let σ 2

be the second-best matching in G . Observe

σ 1

σ 2 . Otherwise, σ 1

σ 2 , and since edge

that there exists an edge e

∈

E such that e

∈

yet e/

∈

⊆

weights are positive, σ 2 > σ 1 , contradicting the optimality of σ 1 .Let G e denote a graph

obtained from G by eliminating the edge e , for any edge e ∈ E . Clearly, for e ∈ σ 1 given above, the

matching σ 2 is also a matching in G e . Moreover, the matching σ 2 is a maximum weight matching in

G e (otherwise, the existence of a matching σ in G e of weight greater than that of σ 2 would contradict

the fact that σ 2 is the second-best matching, since σ is also different from σ 1 ). We conclude that

any maximum weight matching in G e is a second-best matching in G since it differs from σ 1 (in

particular, it does not contain the edge e ), and its weight equals that of σ 2 , which is the second-best

matching.

By the argument above, it follows that if one could correctly guess an edge e

σ 2 , then

the second-best matching could be computed by applying A best to the graph G e . By enumerating

over all possible values of e , which are all the edges of σ 1 , we exhaustively search for σ 2 .Let σ

σ 1 \

∈

e ∈ σ 1 . Since all of these candidate solutions are

different from σ 1 and at least one of them is the second-best matching, the second-best matching

can be found by picking a candidate solution of maximum weight. This leads to Algorithm 2 .

be a maximum weight mapping in G e , for each

Uncertain Schema Matching

Search WWH ::

Custom Search

Home