Database Reference
In-Depth Information
Section 8.4.6. If incoming links are also included in the Linked Dataset, a spam-
mer could use this to misappropriate a third party's URIs and create triples with
subjects from highly regarded Linked Data sources and spam objects. In the future,
then, as search engines downgrade the importance of incoming links that are not
resident in the external dataset, it will make the practice suggested in this section
less useful.
Perhaps the most important lesson we can impart is that we should not be so
arrogant we think we are better than our users or try and hide the data from them.
To return to Ian Davis 23 : “Trust is a social problem and the best solution is one
that involves people making informed judgements on the metadata they encounter.
To make an effective evaluation they need to have the ability to view and explore
metadata with as few barriers as possible.”
8.9.3 l inked d ata Q uality
The issue of how to assess the quality of Linked Data sets has been addressed by
a number of commentators. 24,25 First, one should assess the content. Is it logically
consistent? Is the data accurate (are the facts correct?)? How frequently are updates
made, and is the data current?
Second, one should judge the data model. Is it semantically correct? Is the data
complete? Have a minimum of blank nodes been used? Are rdf:resources used
rather than literals (“things not strings”)? Have vocabularies been reused where pos-
sible? Are the URIs “cool” (Sauermann and Cyganiak, 2008)? Have the scope and
purpose of the dataset and ontology been clearly stated? Does the dataset meet the
stated scope and purpose, that is, is it complete and bounded? Have rdf:labels
been used to make the data more comprehensible to human readers? What formats
and access methods have been provided (for example, a SPARQL endpoint as well
as an RDF dump)? Are there sufficient links to other datasets, particularly incoming
links that have been authored by third parties , to indicate that this dataset is trusted?
Third, one can evaluate the provenance and usage of the data. Is it clearly and
accurately attributed (can you tell where the data came from and who has edited it?)?
What verification is possible? For example, is provenance information included, such
as by the supply of a VoID dataset? Is the licensing clear? Will the data be main-
tained in the future? Is the publisher well known and authoritative?
While a reasoner can be run over an ontology, and SPARQL queries can be used
as outlined in Section 8.3.5 to check data integrity, there are as yet no automated
methods for answering the more subjective of these questions, particularly when
assessing large datasets. Brand recognition and popularity assessment as represented
by incoming links, as well as more explicit social media recommendations, are likely
to be of most use in evaluating in Linked Data quality.
8.10 SUMMARY
This chapter has covered a number of areas surrounding the use and reuse of Linked
Data. We have discussed some of the business models that have been suggested
for exploiting the financial value of the data and covered the query mechanisms in
Search WWH ::




Custom Search