NoSQL data architecture patterns - Making Sense of NoSQL

Databases Reference

In-Depth Information

At the bottom of the stack, you see standards that are used in many areas, such as

standardized character sets (Unicode) and standards that represent identifiers to

objects in a URI -like format. Above that, you see that RDF is stored in XML files, a good

example of using the XML tree-like document structure to contain graphs. Above the

XML layer you see ways that items are classified using a taxonomy ( RDFS ) and above

this you see the standards for ontologies ( OWL ) and rules ( RIF / SWRL ). The SPARQL

query language also sits above the RDF layer. Above these areas, you see some of the

areas that are still not standardized: logic, proof, and trust. This is where much of the

research and development in the Semantic Web is focused. At the top, the user inter-

face layer is similar to the application layers we talked about in chapter 2. Finally,

along the side and to the right are cryptography standards that are used to securely

exchange data over the public internet.

Many of the tools and languages associated with the upper layers of the Semantic

Web Stack are still in research and development, and the number of investment case

studies showing a significant ROI remain few and far between. A more practical step is

to store original source documents with their extracted entities (annotations) directly

in a document store that supports mixed content. We'll discuss these concepts and

techniques later in the next chapter when we look at XML data stores.

In the next section, we'll look at how organizations are combining publicly avail-

able datasets (linked open data) from domain areas such as media, medical and envi-

ronmental science, and publications to perform real-time extract, transform, and

display operations.

U SING GRAPHS TO PROCESS PUBLIC DATASETS

Graph stores are also useful for doing analysis on data that hasn't been created by

your organization. What if you need to do analysis with three different datasets that

were created by three different organizations? These organizations may not even

know each other exists! So how can you automatically join their datasets together to

get the information you need? How do you create mashups or recombinations of this

data in an efficient way? One answer is by using a set of tools referred to as linked open

data or LOD . You can think of it as an integration technique for doing joins between

disparate datasets to create new applications and new insights.

LOD strategies are important for anyone doing research or analysis using publicly

available datasets. This research includes topics such as customer targeting, trend

analysis, sentiment analysis (the application of NLP , computational linguistics, and

text analytics to identify and extract subjective information in source materials), or

the creation of new information services. Recombining data into new forms provides

opportunities for new businesses. As the amount of LOD grows, there are often new

opportunities for new business ventures that combine and enrich this information.

LOD integration creates new datasets by combining information from two or more

publicly available datasets that conform to the LOD structures such as RDF and URI s. A

figure of some of the popular LOD sites called an LOD cloud diagram is shown in

figure 4.17.

Search WWH ::

Custom Search

Home