Database Reference
In-Depth Information
The model will not inherently provide any kind of traditional database capabilities
(such as atomicity of transactions or consistency when multiple transactions are exe-
cuted simultaneously)—those capabilities must be provided by the application itself.
14.10.3.4 In-Memory Computing
The idea of running databases in memory was used by the business intelligence (BI) product
company QlikView. In-memory allows the processing of massive quantities of data in the main
memory to provide immediate results from analysis and transaction. The data to be processed are
ideally real-time data or as close to real time as is technically possible. Data in the main memory
(RAM) can be accessed 100,000 times faster than data on a hard disk; this can dramatically
decrease access time to retrieve data and make it available for the purpose of reporting, analytics
solutions, or other applications.
The medium used by a database to store data, that is, RAM, is divided into pages. In-memory
databases save changed pages in savepoints, which are asynchronously written to persistent stor-
age in regular intervals. Each committed transaction generates a log entry that is written to
nonvolatile storage—this log is written synchronously. In other words, a transaction does not
return before the corresponding log entry has been written to persistent storage—in order to
meet the durability requirement that was described earlier—thus ensuring that in-memory data-
bases meet (and pass) the ACID test. After a power failure, the database pages are restored from
the savepoints; the database logs are applied to restore the changes that were not captured in the
savepoints. This ensures that the database can be restored in memory to exactly the same state as
before the power failure.
14.10.3.5 Developing Big Data Applications
For most big data appliances, the ability to achieve scalability to accommodate growing data
volumes is predicated on multiprocessing—distributing the computation across the collection of
computing nodes in ways that are aligned with the distribution of data across the storage nodes.
One of the key objectives of using a multiprocessing node environment is to speed up application
execution by breaking up large chunks of work into much smaller ones that can be farmed out to a
pool of available processing nodes. In the best of all possible worlds, the datasets to be consumed
and analyzed are also distributed across a pool of storage nodes. As long as there are no dependen-
cies forcing any one specific task to wait to begin until another specific one ends, these smaller
tasks can be executed at the same time, that is, task parallelism . More than just scalability, it is the
concept of automated scalability that has generated the present surge of interest in big data analyt-
ics (with corresponding optimization of costs).
A good development framework will simplify the process of developing, executing, testing,
and debugging new application code, and this framework should include
1. A programming model and development tools
2. Facility for program loading, execution, and process and thread scheduling
3. System configuration and management tools
 
Search WWH ::




Custom Search