Big Data Computing Applications - Guide to Cloud Computing for Business and Technology Managers

Information Technology Reference

In-Depth Information

specific algorithmic demands. For example, some algorithms expect that

massive amounts of data are immediately available quickly, necessitating

large amounts of core memory. Other applications may need numerous iter-

ative exchanges of data between different computing nodes, which would

require high-speed networks.

The big data technology ecosystem stack may include the following:

1. Scalable storage systems that are used for capturing, manipulating,

and analyzing massive data sets.

2. A computing platform, sometimes configured specifically for large-

scale analytics, often composed of multiple (typically multicore) pro-

cessing nodes connected via a high-speed network to memory and

disk storage subsystems. These are often referred to as appliances.

3. A data management environment, whose configurations may range

from a traditional database management system scaled to massive

parallelism to databases configured with alternative distributions

and layouts, to newer graph-based or other NoSQL data manage-

ment schemes.

4. An application development framework to simplify the process of

developing, executing, testing, and debugging new application code.

This framework should include programming models, development

tools, program execution and scheduling, and system configuration

and management capabilities.

5. Methods of scalable analytics (including statistical and data mining

models) that can be configured by the analysts and other business

consumers to help improve the ability to design and build analytical

and predictive models.

6. Management processes and tools that are necessary to ensure align-

ment with the enterprise analytics infrastructure and collaboration

among the developers, analysts, and other business users.

21.2 Tools, Techniques, and Technologies of Big Data

21.2.1 Big Data Architecture

Analytical environments are deployed in different architectural models.

Even on parallel platforms, many databases are built on a shared everything

approach in which the persistent storage and memory components are all

shared by the different processing units.

Parallel architectures are classified by what shared resources each proces-

sor can directly access. One typically distinguishes shared memory, shared

disk, and shared nothing architectures (as depicted in Figure 21.1).

Search WWH ::

Custom Search

Home