Databases Reference
In-Depth Information
ff
The results show that the flight number A123 has two di
erent destinations at the same
date and same time of arrival and departure, which is inconsistent with the ontology
definition that one flight can only have one destination at a specific time and date. This
contradiction arises due to inconsistency in data representation, which can be detected
by using inference and reasoning.
Definition 21 (Conciseness). Conciseness refers to the redundancy of entities, be it at
the schema or the data level. Thus, conciseness can be classified into:
- intensional conciseness (schema level) which refers to the redundant attributes and
- extensional conciseness (data level) which refers to the redundant objects.
Metrics. As conciseness is classified in two categories, it can be measured by as the
ratio between the number of unique attributes (properties) or unique objects (instances)
compared to the overall number of attributes or objects respectively present in a dataset.
Example. In the example flight search engine, since data is fused from di
erent
datasets, an example of intensional conciseness would be a particular flight, say
A123, being represented by two di
ff
erent datasets, such as
http://airlines.org/A123 and http://flights.org/A123. This redundancy
can ideally be solved by fusing the two and keeping only one unique identifier. On the
other hand, an example of extensional conciseness is when both these di
ff
erent identifiers in di
ff
erent iden-
tifiers of the same flight have the same information associated with them in both the
datasets, thus duplicating the information.
ff
Accessibility Dimensions. The dimensions belonging to this category involve aspects
related to the way data can be accessed and retrieved. There are four dimensions part of
this group, which are availability , performance , security and response-time .
Definition 22 (Availability). Availability of a dataset is the extent to which information
is present, obtainable and ready for use.
Metrics. Availability of a dataset can be measured in terms of accessibility of the server,
SPARQL endpoints or RDF dumps and also by the dereferencability of the URIs.
Example. Let us consider the case in which the user looks up a flight in the example
flight search engine. However, instead of retrieving the results, she is presented with an
error response code such as 4xx client error. This is an indication that a requested
resource is unavailable. In particular, when the returned error code is 404 Not Found
code, she may assume that either there is no information present at that specified URI or
the information is unavailable. Naturally, an apparently unreliable system is less likely
to be used, in which case the user may not book flights after encountering such issues.
Definition 23 (Performance). Performance refers to the e
ciency of a system that
binds to a large dataset, that is, the more performant a data source the more e
ciently
a system can process data.
Metrics. Performance is measured based on the scalability of the data source, that is a
query should be answered in a reasonable amount of time. Also, detection of the usage
Search WWH ::




Custom Search