Databases Reference
In-Depth Information
Example usage: A document database can be used to store the results of clicks on the web. For
each log file that is parsed a simple XML construct with the Page_Name, Position_Coordinates,
Clicks, Keywords, Incoming and Outgoing sites, and Date_Time will create a simple model to
query the number of clicks, keywords, date, and links. This processing power cannot be found in an
RDBMS. If you want to expand and capture the URL data, the next version can add the field.
The emergence of document databases is still ongoing at the time of writing this topic, and the
market adoption for this technology will happen soon. We will discuss the integration architecture for
this technology later in this topic.
Graph databases
Social media and the emergence of Facebook, LinkedIn, and Twitter have accelerated the emergence
of the most complex NoSQL database, the graph database. The graph database is oriented toward
modeling and deploying data that is graphical by construct. For example, to represent a person and
their friends in a social network, we can either write code to convert the social graph into key-value
pairs on a Dynamo or Cassandra, or simply convert them into a node-edge model in a graph database,
where managing the relationship representation is much more simplified.
A graph database represents each object as a node and the relationships as an edge. This means
person is a node and household is a node, and the relationship between them is an edge.
Like the classic ER model for RDBMS, we need to create an attribute model for a graph database.
We can start by taking the highest level in a hierarchy as a root node (similar to an entity) and connect
each attribute as its subnode. To represent different levels of the hierarchy we can add a subcategory
or subreference and create another list of attributes at that level. This creates a natural traversal model
like a tree traversal, which is similar to traversing a graph. Depending on the cyclic property of the
graph, we can have a balanced or skewed model. Some of the most evolved graph databases include
Neo4J, infiniteGraph, GraphDB, and AllegroGraph.
NoSQL summary
In summary, NoSQL databases are quickly evolving to be the platform for deploying large-scale data
stores. There are several architectures and techniques to design and deploy the NoSQL solution, and
any solution will require periodic tuning and maintenance as the volume of data being processed is
very high and complex.
Hadoop, NoSQL, and their associated technologies are excellent platforms to process Big Data,
but they all require some amount of MapReduce integration and are not completely architected to
be self-service driven by business users in any enterprise. The next section provides you with a brief
overview on text mining approaches-based architecture to process Big Data called textual ETL.
Textual ETL processing
Business users have always wanted to process unstructured data by interrogating the data with many
different types of algorithms and modeling techniques, while creating the processing rules in an
English-like interface. The outputs of processing unstructured data will be similar to a key-value pair
 
Search WWH ::




Custom Search