Databases Reference
In-Depth Information
read and write to key-value stores or distributed filesystems like Amazon's Simple Storage
Service ( S3 ) or Hadoop Distributed File System ( HDFS ) and may not need the advanced fea-
tures of a document store or an RDBMS .
Other use cases are more demanding and need more features. Big data problems
like event log data and game data do need to store their data directly into structures
that can be queried and analyzed, so they will need different NoSQL solutions.
To be a good candidate for a general class of big data problems, NoSQL solutions
should
Be efficient with input and output and scale linearly with growing data size.
Be operationally efficient. Organizations can't afford to hire many people to
run the servers.
Require that reports and analyses be performed by nonprogrammers using sim-
ple tools—not every business can afford a full-time Java programmer to write
on-demand queries.
Meet the challenges of distributed computing, including consideration of
latency between systems and eventual node failures.
Meet both the needs of overnight batch processing economy-of-scale and time-
critical event processing.
RDBMS can, with enough time and effort, be customized to solve some big data prob-
lems. Applications can be rewritten to distribute SQL queries to many processors and
merge the results of the queries. Databases can be redesigned to remove joins
between tables that are physically located on different nodes. SQL systems can be con-
figured to use replication and other data synchronization processes. Yet these steps all
take considerable time and money. In the long run, it might make sense to move to a
framework that has already solved many of these problems.
Original SQL systems were revolutionary with their standardized declarative lan-
guage . By declarative, we mean that a developer can “declare” what data they want and
yet not be concerned with how they get it or where they get the data from. SQL develop-
ers want and need to be isolated from the question of how to optimize a query, how to
fetch the data, and what server the data is on. Unless your database isolates you from
these questions, you lose many of the benefits of declarative systems like SQL .
NoSQL systems try to isolate the developers from the complexities of distributed
computing. They provide interfaces that allow users to tell a cluster how many nodes a
record must be read to or written from before a valid response is returned. The goal is
to keep the benefits of both declarative systems and horizontal scalability as you move
to distributed computing platforms.
If NoSQL systems really do have better horizontal scaling characteristics, you need
to be able to measure these characteristics. So let's take a look at how horizontal scal-
ability and NoSQL might be measured.
Search WWH ::




Custom Search