Building a NoSQL-Based Web App to Collect Crowd-Sourced Data - Data Just Right: Introduction to Large-Scale Data and Analytics

Database Reference

In-Depth Information

> GET author:1

"Michael Manoochehri"

Some examples of open-source key-value stores are Apache Cassandra and

LinkedIn's Project Voldemort. We will take a closer look at building a scalable data

solution using the open-source Redis database, which is the most popular in-memory

key-value store, later in this chapter.

Document Store

Every day, we interact with numerous documents of various types, both physical and

virtual, such as business cards, receipts, tax returns, and playlists. Some of these docu-

ments have similar characteristics, such as the time they were created or the informa-

tion they might contain about a particular person. Other documents contain data

completely unique to the document type; an online application may have any number

of different fields, for example. The data from this variety of documents might be dif-

ficult to express using the rigid schemas found in relational databases. Not only that:

What if the schema of a variety of documents needed to be changed? In these cases, it

might be the right time to look into using a document store.

A document store is a type of database that stores data as a collection of—you

guessed it—documents. These documents themselves may be XML representations,

JSON objects, and even specific binary formats (see Chapter 2 for a closer look at

these formats). In contrast to a relational database—in which every record in a table

must adhere to the same schema—a document store can contain a variety of records

with completely different schemas. In other words, each record might have a com-

pletely different structure. Although this is also true of most key-value stores, the dif-

ference is that document stores usually allow the user to ask questions about the actual

data in the database, rather than interrogating simply using the key.

A canonical example that illustrates the differences between a document store and

a relational database can be found in serving the information necessary to construct

a page for a typical blog. Blog pages not only feature page content and a title but also

additional content such as an author name, links to related posts, and even user com-

ments. If this information was stored in a relational database, the queries necessary to

build a single page would require accessing a large number of tables.

The user of a document store takes a different approach; all of the content for a

single page is stored in a single, large record. These records remain independent of

one another, and changing one does not affect the rest of the blog post records. If one

of the blog pages contains a completely different chunk of information (say, links to

photo URLs for a slideshow), this information can be added to any document without

worrying about the schema of the others. The relational database, on the other hand,

would represent all of the information as relationships between existing, normal-

ized tables. If a slideshow feature was needed, a new “slideshow” table, with a strictly

defined schema, would need to be created. In addition, relationships to the rest of the

content of the page would need to be defined, likely by a key relating it to a unique

blog post ID.

Search WWH ::

Custom Search

Home