Database Reference
In-Depth Information
so forth—and generate intelligent analytics so that businesses can make better decisions and correct predictions.
Figure 1-2 summarizes the thought process.
Figure 1-2. A process for determining whether you need Big Data
The next step in evaluating an implementation of any business process is to know your existing infrastructure
and capabilities well. Traditional RDBMS solutions are still able to handle most of your requirements. For example,
Microsoft SQL Server can handle 10s of TBs, whereas Parallel Data Warehouse (PDW) solutions can scale up to 100s of
TBs of data.
If you have highly relational data stored in a structured way, you likely don't need Big Data. However, both SQL
Server and PDW appliances are not good at analyzing streaming text or dealing with large numbers of attributes or
JSON. Also, typical Big Data solutions use a scale-out model (distributed computing) rather than a scale-up model
(increasing computing and hardware resources for a single server) targeted by traditional RDBMS like SQL Server.
With hardware and storage costs falling drastically, distributed computing is rapidly becoming the preferred
choice for the IT industry, which uses massive amounts of commodity systems to perform the workload.
However, to what type of implementation you need, you must evaluate several factors related to the three Vs
mentioned earlier:
Do you want to integrate diverse, heterogeneous sources? (Variety): If your answer to
this is yes, is your data predominantly semistructured or unstructured/nonrelational data?
Big Data could be an optimum solution for textual discovery, categorization, and predictive
analysis.
What are the quantitative and qualitative analyses of the data? (Volume): Is there a huge
volume of data to be referenced? Is data emitted in streams or in batches? Big Data solutions
are ideal for scenarios where massive amounts of data needs to be either streamed or batch
processed.
What is the speed at which the data arrives? (Velocity): Do you need to process data that is
emitted at an extremely fast rate? Examples here include data from devices, radio-frequency
identification device (RFID) transmitting digital data every micro second, or other such
scenarios. Traditionally, Big Data solutions are batch-processing or stream-processing systems
best suited for such streaming of data. Big Data is also an optimum solution for processing
historic data and performing trend analyses.
Finally, if you decide you need a Big Data solution, the next step is to evaluate and choose a platform. There
are several you can choose from, some of which are available as cloud services and some that you run on your own
on-premises or hosted hardware. This topic focuses on Microsoft's Big Data solution, which is the Windows Azure
HDInsight Service. This topic also covers the Windows Azure HDInsight Emulator, which provides a test bed for use
before you deploy your solution to the Azure service.
 
Search WWH ::




Custom Search