Database Reference
In-Depth Information
The context for all of these framework components is tightly coupled with the key characteris-
tics of a big data application—algorithms that take advantage of running lots of tasks in paral-
lel on many computing nodes to analyze lots of data distributed among many storage nodes.
Typically, a big data platform will consist of a collection (or a pool ) of processing nodes; the
optimal performances can be achieved when all the processing nodes are kept busy, and that
means maintaining a healthy allocation of tasks to idle nodes within the pool. Any big applica-
tion that is to be developed must map to this context, and that is where the programming model
comes in. The programming model essentially describes two aspects of application execution
within a parallel environment:
1. How an application is coded
2. How that code maps to the parallel environment
MapReduce programming model is a combination of the familiar procedural/imperative
approaches used by Java or C++ programmers embedded within what is effectively a functional
language programming model such as the one used within languages like Lisp and APL. The
similarity is based on MapReduce's dependence on two basic operations that are applied to sets or
lists of data value pairs:
1. Map, which describes the computation or analysis applied to a set of input key/value pairs to
produce a set of intermediate key/value pairs
2. Reduce, in which the set of values associated with the intermediate key/value pair output by
the map operation are combined to provide the results
A MapReduce application is envisioned as a series of basic operations applied in a sequence to
small sets of many (millions, billions, or even more) data items. These data items are logically
organized in a way that enables the MapReduce execution model to allocate tasks that can be
executed in parallel.
Combining both data and computational independence means that both the data
and the computations can be distributed across multiple storage and processing units
and automatically parallelized. This parallelizability allows the programmer to
exploit scalable massively parallel processing resources for increased processing speed
and performance.
14.10.4 SAP HANA
After the requisite background on big data and in-memory computing, we are now ready to get
acquainted with SAP HANA.
In 2006, SAP introduced the BW Accelerator (BWA), which was an appliance-based
solution specifically targeted to improve the reporting and analytic capabilities for its SAP
NetWeaver Business Warehouse (BW). The BWA solution is based on TREX (SAP's Search
and Classification Engine) technology to support querying the large amounts of BW data for
 
Search WWH ::




Custom Search