Introducing Big Data Technologies - Data Warehousing in the Age of Big Data

Databases Reference

In-Depth Information

Example usage: A document database can be used to store the results of clicks on the web. For

each log file that is parsed a simple XML construct with the Page_Name, Position_Coordinates,

Clicks, Keywords, Incoming and Outgoing sites, and Date_Time will create a simple model to

query the number of clicks, keywords, date, and links. This processing power cannot be found in an

RDBMS. If you want to expand and capture the URL data, the next version can add the field.

The emergence of document databases is still ongoing at the time of writing this topic, and the

market adoption for this technology will happen soon. We will discuss the integration architecture for

this technology later in this topic.

Graph databases

Social media and the emergence of Facebook, LinkedIn, and Twitter have accelerated the emergence

of the most complex NoSQL database, the graph database. The graph database is oriented toward

modeling and deploying data that is graphical by construct. For example, to represent a person and

their friends in a social network, we can either write code to convert the social graph into key-value

pairs on a Dynamo or Cassandra, or simply convert them into a node-edge model in a graph database,

where managing the relationship representation is much more simplified.

A graph database represents each object as a node and the relationships as an edge. This means

person is a node and household is a node, and the relationship between them is an edge.

Like the classic ER model for RDBMS, we need to create an attribute model for a graph database.

We can start by taking the highest level in a hierarchy as a root node (similar to an entity) and connect

each attribute as its subnode. To represent different levels of the hierarchy we can add a subcategory

or subreference and create another list of attributes at that level. This creates a natural traversal model

like a tree traversal, which is similar to traversing a graph. Depending on the cyclic property of the

graph, we can have a balanced or skewed model. Some of the most evolved graph databases include

Neo4J, infiniteGraph, GraphDB, and AllegroGraph.

NoSQL summary

In summary, NoSQL databases are quickly evolving to be the platform for deploying large-scale data

stores. There are several architectures and techniques to design and deploy the NoSQL solution, and

any solution will require periodic tuning and maintenance as the volume of data being processed is

very high and complex.

Hadoop, NoSQL, and their associated technologies are excellent platforms to process Big Data,

but they all require some amount of MapReduce integration and are not completely architected to

be self-service driven by business users in any enterprise. The next section provides you with a brief

overview on text mining approaches-based architecture to process Big Data called textual ETL.

Textual ETL processing

Business users have always wanted to process unstructured data by interrogating the data with many

different types of algorithms and modeling techniques, while creating the processing rules in an

English-like interface. The outputs of processing unstructured data will be similar to a key-value pair

Search WWH ::

Custom Search

Home