Databases Reference
In-Depth Information
One database
OR
Many databases
￿ Data partitioning
￿ Replication
￿ Clustering
￿ Query distribution
￿ Load balancing
￿ Consistency/Syncing
￿ Latency/Concurrency
￿ Clock synchronization
￿ Network bottlenecks/failures
￿ Multiple data centers
￿ Distributed backup
￿ Node failure
￿ Voting algorithms for error detection
￿ Administration of many systems
￿ Monitoring
￿ Scalable if designed correctly
￿ Easy to understand
￿ Easy to set up and configure
￿ Easy to administer
￿ Single source of truth
￿ Limited scalability
Figure 6.1 One or many databases? Here are some of the challenges you face when you
move from a single processor to a distributed computing system. Moving to a distributed
environment is a nontrivial endeavor and should be done only if the business problem
really warrants the need to handle large data volumes in a short period of time. This is
why platforms like Hadoop are complex and require a complex framework to make things
easier for the application developer.
trick is to come up with a process to ensure the sample you choose is a fair representa-
tion of the full dataset.
You should also consider how quickly you need your data processed. Many data
analysis problems can be handled by a batch-type solution running on a single proces-
sor; you may not need an immediate answer. The key is to understand the true time-
critical nature of your situation.
Now that you know that distributed databases are more complex than a single pro-
cessor system and there are alternatives to using a full dataset, let's look at why organi-
zations are moving toward these complex systems. Why is the ability to handle big data
strategically important to many organizations? Answering this question involves
understanding the external factors that are driving the big data marketplace.
Here are some typical big data use cases:
Bulk image processing —Organizations like NASA regularly receive terabytes of
incoming data from satellites or even rovers on Mars. NASA uses a large number
of servers to process these images and perform functions like image enhance-
ment and photo stitching. Medical imaging systems like CAT scans and MRI s
need to convert raw image data into formats that are useful to doctors and
patients. Custom imaging hardware has been found to be more expensive than
renting a large number of processors on the cloud when they're needed. For
example, the New York Times converted 3.3 million scans of old newspaper arti-
cles into web formats using tools like Amazon EC2 and Hadoop for a few hun-
dred dollars.
Search WWH ::




Custom Search