Databases Reference
In-Depth Information
Let's take a look and see the variations on
terms used to describe different types of graphs.
As you use the web, you'll often see links on a
page that take you to another page; these links can
be represented by a graph or triple. The current
web page is the first or source node, the link is the
arc that “points to” the second page, and the sec-
ond or destination page is the second node. In this
example, the first node is represented by the
URL
of the source page and the second node or desti-
nation is the
URL
of the destination page. This
linking process can be found in many places on
the web, from page links to wiki sites, where each
source and destination node is a page
URL
.
Figure 4.11 is an example of a graph store that has
a web page that links to other web pages.
The concept of using
URL
s to identify nodes is appealing since it's human readable
and provides a structure within the
URL
. The
W3C
generalized this structure to store
the information about the links between pages as well as the links between objects into
a standard called
Resource Description Format
, more commonly known as
RDF
.
Source web page
Destination web page
Figure 4.11
An example of using a
graph store to represent a web page
that contains links to two other web
pages. The URL of the source web
page is stored as a URL property and
each link is a relationship that has a
“points to” property. Each link is
represented as another node with a
property that contains the
destination page's URL.
4.2.2
Linking external data with the RDF standard
In a general-purpose graph store, you can create your own method to determine
whether two nodes reference the same point in a graph. Most graph stores will assign
internal
ID
s to each node as they load these nodes into
RAM
. The
W3C
has focused on
a process of using
URL
-like identifiers called
uniform resource identifiers (
URI
s)
to create
explicit node identifiers for each node. This standard is called the
W3C
Resource
Description Format (
RDF
)
.
RDF
was specifically created to join together external datasets created by different
organizations. Conceptually, you can load two external datasets into one graph store
and then perform graph queries on this joined database. The trick is knowing when
two nodes reference the same object.
RDF
uses directed graphs, where the relation-
ship specifically points from a source node to a des-
tination node. The terminology for the source,
link, and destination may vary based on your situa-
tion, but in general the terms
subject
,
predicate
, and
object
are used, as shown in figure 4.12.
These terms come from formal logic systems
and language. This terminology for describing how
nodes are identified has been standardized by the
W3C
in their
RDF
standard. In
RDF
each node-arc-
node relationship is called a
triple
and is associated
Predicate
Subject
Object
Figure 4.12
How RDF uses specific
names for the general node-
relationship-node structure. The
source node is the subject, and the
destination node is the object. The
relationship that connects them
together is the predicate. The entire
structure is called an assertion.