Databases Reference
In-Depth Information
At the bottom of the stack, you see standards that are used in many areas, such as
standardized character sets (Unicode) and standards that represent identifiers to
objects in a URI -like format. Above that, you see that RDF is stored in XML files, a good
example of using the XML tree-like document structure to contain graphs. Above the
XML layer you see ways that items are classified using a taxonomy ( RDFS ) and above
this you see the standards for ontologies ( OWL ) and rules ( RIF / SWRL ). The SPARQL
query language also sits above the RDF layer. Above these areas, you see some of the
areas that are still not standardized: logic, proof, and trust. This is where much of the
research and development in the Semantic Web is focused. At the top, the user inter-
face layer is similar to the application layers we talked about in chapter 2. Finally,
along the side and to the right are cryptography standards that are used to securely
exchange data over the public internet.
Many of the tools and languages associated with the upper layers of the Semantic
Web Stack are still in research and development, and the number of investment case
studies showing a significant ROI remain few and far between. A more practical step is
to store original source documents with their extracted entities (annotations) directly
in a document store that supports mixed content. We'll discuss these concepts and
techniques later in the next chapter when we look at XML data stores.
In the next section, we'll look at how organizations are combining publicly avail-
able datasets (linked open data) from domain areas such as media, medical and envi-
ronmental science, and publications to perform real-time extract, transform, and
display operations.
U SING GRAPHS TO PROCESS PUBLIC DATASETS
Graph stores are also useful for doing analysis on data that hasn't been created by
your organization. What if you need to do analysis with three different datasets that
were created by three different organizations? These organizations may not even
know each other exists! So how can you automatically join their datasets together to
get the information you need? How do you create mashups or recombinations of this
data in an efficient way? One answer is by using a set of tools referred to as linked open
data or LOD . You can think of it as an integration technique for doing joins between
disparate datasets to create new applications and new insights.
LOD strategies are important for anyone doing research or analysis using publicly
available datasets. This research includes topics such as customer targeting, trend
analysis, sentiment analysis (the application of NLP , computational linguistics, and
text analytics to identify and extract subjective information in source materials), or
the creation of new information services. Recombining data into new forms provides
opportunities for new businesses. As the amount of LOD grows, there are often new
opportunities for new business ventures that combine and enrich this information.
LOD integration creates new datasets by combining information from two or more
publicly available datasets that conform to the LOD structures such as RDF and URI s. A
figure of some of the popular LOD sites called an LOD cloud diagram is shown in
figure 4.17.
Search WWH ::




Custom Search