Database Reference
In-Depth Information
l Independent: content is generated without dependence on any other data
(although pointers to other data may exist). Blogs are a typical example of this.
l Parasitic: the user-generated data is dependent for its existence on other, pre-
existing (and usually not user-generated) data. Tags are a typical example of this.
Both types are important and deserve to be studied. Here, we focus on parasitic
data, which is generated when users interact with an existing data repository. A data
repository can be a Web site, an (electronic) document, or a database. Our focus
here is on relational databases.
While some data repositories accept user input, traditional (relational) databases
are not able to store user-created content, except when such information flows
through predefined, restricted access paths. There are several actions that users
are not allowed to perform with current relational technology:
l Very often, users are presented with a view of the database, which they cannot
update (add, delete, or change existing data). This is a well-known issue,
mentioned here for completeness [ 4 ].
l Users cannot restructure the database data, that is, change the given schema to
another schema that is more convenient. In particular, sometimes a user con-
siders some of the data as labels that could be used to manage other data (i.e., as
metadata). Existing systems do not allow data-metadata restructuring [ 5 ]. Other
restructuring by users is usually severely restricted.
l Users cannot add metadata, tags, comments for data, or query results. In partic-
ular, they cannot add to the database anything except
data that is structured
exactly as the data in the database already is
.
The first two issues are well known and have been investigated in the research
literature. The last one, however, has been considered only recently and in a limited
way (see Sect. 7.7 for references to related work). There is a strong assumption that
users should not add anything to the database that does not conform to the existing
schema (we discuss this view further in Sect. 7.6 ). Here, our goal is to disregard this
assumption and allow users to enter data in their terms. To understand what the
challenges are, we give a few examples of unsupported interaction next.
In practice, we expect that most user-created content will be associated with
queries, in particular, with the metadata of queries: a user may want to enter tags or
comments about the results of a query, i.e., finding unexpected and/or interesting
results, or simply trying to interpret the data. Of course, a user may also make other
annotations. Assume, for instance, a user accessing a geographic database, with
information about the position of certain locations: in a table Locations , there
are attributes longitude and latitude . The user examines data in the table
and realizes that the format is in the form of degrees, minutes, and seconds, so she/
he makes a comment and attaches it to both attributes (i.e., to metadata). Then the
user issues a query and examines the results and realizes that some of the tuples
have as latitude values outside the
90 range, a mistake probably due to
faulty data. The user proceeds to mark such tuples as having incorrect values
90 to
รพ
Search WWH ::




Custom Search