Information Technology Reference
In-Depth Information
21.2.4 In-Memory Computing
The idea of running databases in memory was used by business intelligence
(BI) product company QlikView. In-memory allows the processing of mas-
sive quantities of data in main memory to provide immediate results from
analysis and transaction. The data to be processed is ideally real-time data or
as close to real time as is technically possible. Data in main memory (RAM)
can be accessed 100,000 times faster than data on a hard disk; this can dra-
matically decrease access time to retrieve data and make it available for the
purpose of reporting, analytics solutions, or other applications.
The medium used by a database to store data, that is, RAM, is divided
into pages. In-memory databases saves changed pages in savepoints,
which are asynchronously written to persistent storage in regular inter-
vals. Each committed transaction generates a log entry that is written to
nonvolatile storage—this log is written synchronously. In other words, a
transaction does not return before the corresponding log entry has been
written to persistent storage—in order to meet the durability require-
ment that was described earlier—thus ensuring that in-memory databases
meet (and pass) the ACID test (see Section 5.7, “Transaction Processing
Monitors” for a Note on ACID). After a power failure, the database pages
are restored from the savepoints; the database logs are applied to restore
the changes that were not captured in the savepoints. This ensures that the
database can be restored in memory to exactly the same state as before the
power failure.
21.2.5 Developing Big Data Applications
For most big data appliances, the ability to achieve scalability to accommo-
date growing data volumes is predicated on multiprocessing—distributing
the computation across the collection of computing nodes in ways that are
aligned with the distribution of data across the storage nodes. One of the key
objectives of using a multiprocessing node environment is to speed applica-
tion execution by breaking up large chunks of work into much smaller ones
that can be farmed out to a pool of available processing nodes. In the best
of all possible worlds, the data sets to be consumed and analyzed are also
distributed across a pool of storage nodes. As long as there are no depen-
dencies forcing any one specific task to wait to begin until another specific
one ends, these smaller tasks can be executed at the same time, that is, task
parallelism . More than just scalability, it is the concept of automated scalability
that has generated the present surge of interest in big data analytics (with
corresponding optimization of costs).
A good development framework will simplify the process of developing,
executing, testing, and debugging new application code, and this framework
should include
Search WWH ::




Custom Search