Databases Reference
In-Depth Information
CHAPTER
3
Big Data Processing
Architectures
And pluck till time and times are done
The silver apples of the moon,
The golden apples of the sun.
—W. B. Yeats
INTRODUCTION
Data processing has been a complex subject to deal with since the primitive days of computing. The
underlying reason for this stems from the fact that complexity is induced from the instrumentation
of data rather than the movement of data. Instrumentation of data requires a complete understanding
of the data and the need to maintain consistency of processing (if the data set is broken into multiple
pieces), the need to integrate multiple data sets through the processing cycles to maintain the integrity
of the data, and the need for complete associated computations within the same processing cycle. The
instrumentation of transactional data has been a challenge considering the discrete nature of the data,
and the magnitude of the problem amplifies with the increase in the size of the data. This problem has
been handled in multiple ways within the RDBMS-based ecosystem for online transaction processing
(OLTP) and data warehousing, but the solutions cannot be extended to the Big Data situation. How
do we deal with processing Big Data? Taking distributed processing, storage, neural networks, mul-
tiprocessor architectures, and object-oriented concepts, combined with Internet data processing tech-
niques, there are several approaches that have been architected for processing Big Data.
Data processing revisited
Data processing can be defined as the collection, processing, and management of data resulting in
information generation to end consumers. Broadly, the different cycles of activities in data processing
can be described as shown in Figure 3.1 .
Transactional data processing follows this life cycle, as the data is first analyzed and modeled.
The data collected is structured in nature and discrete in volume, since the entire process is predefined
based on known requirements. Other areas of data management, like quality and cleansing, are a non-
issue, as they are handled in the source systems as a part of the process. Data warehouse data process-
ing follows similar patterns as transaction data processing, the key difference is the volume of data to
be processed varies depending on the source that is processed. Before we move onto Big Data pro-
cessing, let us discuss the techniques and challenges in data processing
29
 
Search WWH ::




Custom Search