Social Interaction in Databases - Community-Built Databases: Research and Development

Database Reference

In-Depth Information

l Independent: content is generated without dependence on any other data

(although pointers to other data may exist). Blogs are a typical example of this.

l Parasitic: the user-generated data is dependent for its existence on other, pre-

existing (and usually not user-generated) data. Tags are a typical example of this.

Both types are important and deserve to be studied. Here, we focus on parasitic

data, which is generated when users interact with an existing data repository. A data

repository can be a Web site, an (electronic) document, or a database. Our focus

here is on relational databases.

While some data repositories accept user input, traditional (relational) databases

are not able to store user-created content, except when such information flows

through predefined, restricted access paths. There are several actions that users

are not allowed to perform with current relational technology:

l Very often, users are presented with a view of the database, which they cannot

update (add, delete, or change existing data). This is a well-known issue,

mentioned here for completeness [ 4 ].

l Users cannot restructure the database data, that is, change the given schema to

another schema that is more convenient. In particular, sometimes a user con-

siders some of the data as labels that could be used to manage other data (i.e., as

metadata). Existing systems do not allow data-metadata restructuring [ 5 ]. Other

restructuring by users is usually severely restricted.

l Users cannot add metadata, tags, comments for data, or query results. In partic-

ular, they cannot add to the database anything except

data that is structured

exactly as the data in the database already is

.

The first two issues are well known and have been investigated in the research

literature. The last one, however, has been considered only recently and in a limited

way (see Sect. 7.7 for references to related work). There is a strong assumption that

users should not add anything to the database that does not conform to the existing

schema (we discuss this view further in Sect. 7.6 ). Here, our goal is to disregard this

assumption and allow users to enter data in their terms. To understand what the

challenges are, we give a few examples of unsupported interaction next.

In practice, we expect that most user-created content will be associated with

queries, in particular, with the metadata of queries: a user may want to enter tags or

comments about the results of a query, i.e., finding unexpected and/or interesting

results, or simply trying to interpret the data. Of course, a user may also make other

annotations. Assume, for instance, a user accessing a geographic database, with

information about the position of certain locations: in a table Locations , there

are attributes longitude and latitude . The user examines data in the table

and realizes that the format is in the form of degrees, minutes, and seconds, so she/

he makes a comment and attaches it to both attributes (i.e., to metadata). Then the

user issues a query and examines the results and realizes that some of the tuples

have as latitude values outside the

90 range, a mistake probably due to

faulty data. The user proceeds to mark such tuples as having incorrect values

90 to

þ

Community-Built Databases: Research and Development

Search WWH ::

Custom Search

Home