Automatic Categorization of Web Database Query Results - Advanced Database Query Systems

Database Reference

In-Depth Information

Now, we can define the similarity between Q 1 and Q 2 using their vector representations V Q 1 and V Q 2

as follows:

V V

⋅

Sim(

Q Q

)

cos(

V V

)

(2)

In order to quantify how well a query Q 1 is represented by another query Q 2 , we need to define a

distance measure between two queries. Based on the similarity mentioned above, the distance between

Q 1 and Q 2 can be defined as

d ( Q 1 , Q 2 ) = 1- Sim( Q 1 , Q 2 )

(3)

Based on the definitions above, the queries clustering problem can be defined. Let H be the s et of m

queries in query history: H = { Q 1 ,…, Q m }. The we need to find a set of k queries H k = { Q 1 ,…, Q k } ( k

< m ) such that:

cos (

t H

)

= ∈

d Q QC

( ,

)

(4)

Q H

is minimized. The distance of a query Q i from a set of queries H is defined as

d Q H

(

) min (

d Q Q

)

(5)

Q H

∈

We call the queries in set H k representative queries and associate with each representative query Q i

a set of queries QC

{

Q Q

argmin

d Q Q

(

)}

Complexity of the Queries Clustering Problem

The problem of queries clustering is the same as the k -median problem. The k -median problem is well

known to be NP-hard. An instance of the metric k -median problem consists of a metric space χ = ( X , c),

where X is a set of points and c is a distance function (also called the cost) that specifies the distance c xy

≥ 0 between any pair of nodes x , y ∧ X . The distance function is reflexive, symmetric, and satisfies the

triangle inequality. Given a set of points F ∧ X , the cost of F is defined by cost

∑

( )

where

x X xF

∈

min for x ∧ X . The objective is to find a k -element set F ∧ X that minimizes cost( F )

(Chrobak, Keynon & Young, 2005). Obviously, the queries clustering problem can be treated as the k -

median problem and it is also NP-hard. Thus, we have to think of approximation algorithms for solving

it.

f F

∈

Advanced Database Query Systems

Search WWH ::

Custom Search

Home