Database Reference
In-Depth Information
architectures of real-time analysis include (a) parallel processing clusters using
traditional relational databases, and (b) memory-based computing platforms. For
example, Greenplum from EMC and HANA from SAP are all real-time analysis
architectures.
Offline analysis is usually used for applications without high requirements on
response time, e.g., machine learning, statistical analysis, and recommendation
algorithms. Offline analysis generally conducts analysis by importing big data of
logs into a special platform through data acquisition tools. Under the big data
setting, many Internet enterprises utilize the offline analysis architecture based on
Hadoop in order to reduce the cost of data format conversion and improve the
efficiency of data acquisition. Examples include Facebook's open source tool Scribe,
LinkedIn's open source tool Kafka, Taobao's open source tool Timetunnel, and
Chukwa of Hadoop, etc. These tools can meet the demands of data acquisition and
transmission with hundreds of MB per second.
5.3.2
Analysis at Different Levels
Big data analysis can also be classified into memory level analysis, Business
Intelligence (BI) level analysis, and massive level analysis, which are examined in
the following.
￿
Memory-Level : Memory-level analysis is for the case when the total data volume
is within the maximum level of the memory of a clusters. The memory of current
server cluster surpasses hundreds of GB while even the TB level is common.
Therefore, an internal database technology may be used and hot data shall
reside in the memory so as to improve the analytical efficiency. Memory-level
analysis is extremely suitable for real-time analysis. MongoDB is a representative
memory-level analytical architecture. With the development of SSD (Solid-State
Drive), the capacity and performance of memory-level data analysis has been
further improved and widely applied.
￿
BI : BI analysis is for the case when the data scale surpasses the memory level
but may be imported into the BI analysis environment. Currently, mainstream BI
products are provided with data analysis plans supporting the level over TB.
￿
Massive : Massive analysis for the case when the data scale has completely
surpassed the capacities of BI products and traditional relational databases. At
present, most massive analysis utilize HDFS of Hadoop to store data and use
MapReduce for data analysis. Most massive analysis belongs to the offline
analysis category.
Search WWH ::




Custom Search