Database Reference
In-Depth Information
Other suggestions have been put forward for encoding provenance, for example,
the Open Provenance Model (Moreau et al., 2011), an OWL ontology 33 that allows
additional descriptions of provenance based on Agents (authors, publishers, etc.),
Processes (e.g., reslicing of data), and Artifacts (e.g., an RDF graph that was gen-
erated by a process). At the time of writing, there were several other vocabular-
ies that can be used to describe provenance, including the Changeset Vocabulary, 34
Provenance Vocabulary, 35 and Semantic Web Publishing Vocabulary. 36 A W3C
Provenance Interchange Working Group 37 is under way, tasked with providing map-
pings between these various provenance formats. As the technology is in a state of
flux, with no clear de facto standard, we just recommend that the provenance of
your GI Linked Data is specified using one of these vocabularies. For those hoping
to reuse your data, it is useful to include descriptions of who has written and pub-
lished your GI Linked Data and any limitations on the accuracy or frequency of your
surveys or other data-gathering techniques.
7.8 AUTHENTICATION AND TRUST
A word now on the various other aspects of data quality assessment, namely, authen-
tication and trust. Authentication contributes to the establishment of trust and
includes mechanisms such as verifying a URI, controlling access to a resource, or
using digital signatures, while trust is more of a social concept and remains harder
to mechanize.
The Named Graphs API for Jena 38 (NG4J) is one software library that can be used
to produce digital signatures for Linked Data and contribute to the authentication
process as it can be particularly helpful in verifying that the provenance metadata
does indeed belong to the Linked Dataset itself. The method NG4J uses to sign and
store the digital signature of a Named Graph is first to find its canonical representa-
tion, that is, a representation that specifies which nodes of the graph are adjacent to
which other nodes. Second, a digest of the canonical graph is calculated using any
common secure hash function (for example, SHA-1). The digest is represented as its
own named graph, which is called the Warrant Graph. In turn, the canonical repre-
sentation of the Warrant Graph is taken and signed with the data publisher's private
key using a standard signature algorithm like DSA or RSA. This signature is added
to the Warrant Graph, and the signed Warrant Graph can then be published. To check
whether a digital signature of a named graph is valid, the NG4J software will carry
out the following verification process: First, the digital signature is extracted from
the warrant graph of the named graph, along with the public key of the information
publisher. The public key is used to verify the signature of the Warrant Graph, that is,
to check that the signature does indeed belong to the information publisher. Second,
the canonical representation of the named graph is found and a digest created using
the SHA-1 hash function. This digest is compared against the digest in the warrant
graph, and if they are the same, then the named graph has a valid signature.
While provenance provides the input information to a trust measurement algo-
rithm, the degree of trust itself is the result of the question: Is this data good enough
to use? and is often based in part on who else thinks the data is good enough to use.
Search WWH ::




Custom Search