Databases Reference
In-Depth Information
Figure 4.17
The linked open data cloud is a series of shaded circles that are connected by lines. The shades
indicate the domain—for example, darker for geographic datasets, lighter for life sciences. (Diagram by Richard
Cyganiak and Anja Jentzsch:
http://lod-cloud.net
)
At the center of
LOD
cloud diagrams you'll see sites that contain a large number of
general-purpose datasets. These sites include
LOD
hub sites such as DBPedia or Free-
base. DBPedia is a website that attempts to harvest facts from Wikipedia and convert
them into
RDF
assertions. The data in the info boxes in Wikipedia is a good example
of a source of consistent data in wiki format. Due to the diversity of data in DBPedia,
it's frequently used as a hub to connect different datasets together.
Once you find a site that has the
RDF
information you're looking for, you can pro-
ceed in two ways. The first is to download
all
the
RDF
data on the site and load it into
your graph store. For large
RDF
collections like DBPedia that have billions of triples,
this can be impracticable. The second and more efficient method is to find a web ser-
vice for the
RDF
site called a
SPARQL
endpoint
. This service allows you to submit
SPARQL
queries to extract the data from each of the websites you need in an
RDF
form
that can then be joined with other
RDF
datasets. By combining the data from
SPARQL
queries, you can create new data mashups that join data together in the same way
joins combine data from two different tables in an
RDBMS
.
The key difference between a
SPARQL
query and an
RDBMS
is the process that cre-
ates the primary/foreign keys. In the
RDBMS
, all of the keys are in the same domain,