Social Interaction in Databases - Community-Built Databases: Research and Development

Database Reference

In-Depth Information

At the data level, however, there are some special issues to deal with. Let c ( D )

{ r 1 ,

, r m }, where each r i is a rowid (1

m ). Assume that the user attached

...

some content to a row r

c ( D ). In fact, the user may have attached several pieces of

∈

content u 1 ,

, u s . However, the user attached content u 1 to r when it was part of

the answer of another command c 0 , and so it was part of the set c 0 ( D )

...

6¼

c ( D ),

content u 2 when it was part of the answer to another command c 00 ...

Should all this

content still be displayed? Some of it? One could argue that such content should

be displayed in context, so the content associated with r through c 0 ( D ) should

be displayed only if c ( D ) is somewhat related to c 0 ( D ). For a given rowid r

∈

p rowid Refs , we define the annotation contexts of r as follows:

Þ¼f

refid

rowid

(that is, the set of reference ids containing r ). For each annotation context ac

AC ( r ),

∈

its extension is simply the set of all rowids attached to it:

Ext

Þ¼f

rowid

refid

Then, for each row r j ∈

c ( D ), we display the user-created content attached to r j in

context ac

AC ( r j )if Ext ( ac ) is sufficiently related to c ( D ). What “sufficiently

related” is can be defined using several semantic measures. An analytic measure

can be defined along the lines of typical metrics like Jaccard, since both Ext ( ac ) and

c ( D ) are sets:

∈

Þ¼ j

Ext

Þ\

Þj

dist

;

Þj a

Ext

Þ[

where

is a threshold. The advantages of this well-known metric is that it closely

matches intuition in extreme cases (i.e., it reaches its maximum value of 1 when

c ( D )

Ext ( ac )or Ext ( ac )

c ( D )), and its minimum of 0 when Ext

Þ\

Þ¼;

Þ¼ S r2cðDÞ

Thus, for c ( D )

,thisrepre-

sents contexts that are common to all tuples in c ( D ). Then we can choose ac

{ r 1 ,

, r m }, let Ac

.If Ac

Þ 6¼;

...

such that min r ∈ c ( D ) dist ( ac , r ), that is, the context that is the closest to some tuple

in c ( D ). Other measures are also possible: the context that minimizes overall distance

( min

Þ¼;

(there is no context that is common to all tuples) we can choose, for each tuple r

S r ∈ c ( D ) dist ( ac , r )) or average distance ( minAvg r ∈ c ( D ) dist ( ac , r )). If Ac

( D ), the display of the context ac that minimizes the distance to c ( D ): Ext ( ac ):

min ac ∈ AC ( r ) dist ( ac , c )).

Once it is decided which user-created content to show, the same technique

outlined above (outerjoin based on rowids) can be used.

Note that carrying out the procedure just outlined can be quite costly: we need

to compute, for all r

∈

AC ( r ), we have to obtain

Ext ( ac ), and finally we need to determine a metric between Ext ( ac ) and c ( D )

(the one proposed above or an alternate one). It is easy to see that in a worst-case

c ( D ), AC ( r ); then, for each ac

∈

Community-Built Databases: Research and Development

Search WWH ::

Custom Search

Home