Database Reference
In-Depth Information
2.1
Data Characterization Framework
Data may be viewed abstractly as a set of records with a common structure, each
record being a sequence of elements (such as numbers or strings) which either reflect
the results of some observations or measurements or specify the context in which the
observations or measurements were obtained. The context may include, for example,
the place and the time of observation or measurement, and the object or group of
objects observed. The elements that a data record consists of are called values .
All records of a dataset are assumed to have a common structure, with each posi-
tion having its specific meaning, which is common to all values appearing in it. These
positions may be named to distinguish between them. The positions are usually called
components of the data.
Definition: Characteristic component , or attribute , is a data component correspond-
ing to a measured or observed property of the phenomenon reflected in the data.
Characteristic is a value of a single attribute or a combination of values of several
dataset attributes.
Definition: Referential component , or referrer , is a data component reflecting an
aspect of the context in which the observations or measurements were made. Refer-
ence is the value of a single referrer or the combination of values of several referrers
that fully specifies the context of some observation(s) or measurement(s).
Definition: Reference set of a dataset is the set of all references occurring in this
dataset.
Definition: Characteristic set of a dataset is the set of all possible characteristics, (i.e.
combinations of values of the dataset attributes).
Definition: Multidimensional dataset is a dataset having two or more referrers. De-
pending on the number of referrers, a dataset may be called one-dimensional, two-
dimensional, three-dimensional, and so on.
For example, the geographical location and the time are referrers for measurements of
properties of the climate such as air temperature or wind direction, which are attrib-
utes. Each combination of location and time is a reference, and the corresponding
combination of air temperature and wind direction is a characteristic. This is a two-
dimensional dataset as it has two referrers; the attributes are not counted as dimen-
sions. Referrers are independent components and attributes are dependent since the
values of attributes depend on the context in which they are observed. In data analy-
sis, it is possible to deal with selected attributes independently from the others; how-
ever, all referrers present in a dataset need to be handled simultaneously.
Data may be viewed formally as a function, in the mathematical sense, with the re-
ferrers being independent variables and the attributes being dependent variables. The
function defines the correspondence between the references and the characteristics
where for each combination of values of the referential components there is at most
one combination of values of the attributes.
The structure of a dataset is characterized by specifying which components it in-
cludes, which of them are referrers, and which ones are attributes. Additionally to
this, it is necessary to specify the properties of the components. The relevant proper-
ties are:
Search WWH ::




Custom Search