Databases Reference
In-Depth Information
Figure 4.17 The linked open data cloud is a series of shaded circles that are connected by lines. The shades
indicate the domain—for example, darker for geographic datasets, lighter for life sciences. (Diagram by Richard
Cyganiak and Anja Jentzsch: http://lod-cloud.net )
At the center of LOD cloud diagrams you'll see sites that contain a large number of
general-purpose datasets. These sites include LOD hub sites such as DBPedia or Free-
base. DBPedia is a website that attempts to harvest facts from Wikipedia and convert
them into RDF assertions. The data in the info boxes in Wikipedia is a good example
of a source of consistent data in wiki format. Due to the diversity of data in DBPedia,
it's frequently used as a hub to connect different datasets together.
Once you find a site that has the RDF information you're looking for, you can pro-
ceed in two ways. The first is to download all the RDF data on the site and load it into
your graph store. For large RDF collections like DBPedia that have billions of triples,
this can be impracticable. The second and more efficient method is to find a web ser-
vice for the RDF site called a SPARQL endpoint . This service allows you to submit
SPARQL queries to extract the data from each of the websites you need in an RDF form
that can then be joined with other RDF datasets. By combining the data from SPARQL
queries, you can create new data mashups that join data together in the same way
joins combine data from two different tables in an RDBMS .
The key difference between a SPARQL query and an RDBMS is the process that cre-
ates the primary/foreign keys. In the RDBMS , all of the keys are in the same domain,
Search WWH ::




Custom Search