Information Technology Reference
In-Depth Information
specific algorithmic demands. For example, some algorithms expect that
massive amounts of data are immediately available quickly, necessitating
large amounts of core memory. Other applications may need numerous iter-
ative exchanges of data between different computing nodes, which would
require high-speed networks.
The big data technology ecosystem stack may include the following:
1. Scalable storage systems that are used for capturing, manipulating,
and analyzing massive data sets.
2. A computing platform, sometimes configured specifically for large-
scale analytics, often composed of multiple (typically multicore) pro-
cessing nodes connected via a high-speed network to memory and
disk storage subsystems. These are often referred to as appliances.
3. A data management environment, whose configurations may range
from a traditional database management system scaled to massive
parallelism to databases configured with alternative distributions
and layouts, to newer graph-based or other NoSQL data manage-
ment schemes.
4. An application development framework to simplify the process of
developing, executing, testing, and debugging new application code.
This framework should include programming models, development
tools, program execution and scheduling, and system configuration
and management capabilities.
5. Methods of scalable analytics (including statistical and data mining
models) that can be configured by the analysts and other business
consumers to help improve the ability to design and build analytical
and predictive models.
6. Management processes and tools that are necessary to ensure align-
ment with the enterprise analytics infrastructure and collaboration
among the developers, analysts, and other business users.
21.2 Tools, Techniques, and Technologies of Big Data
21.2.1 Big Data Architecture
Analytical environments are deployed in different architectural models.
Even on parallel platforms, many databases are built on a shared everything
approach in which the persistent storage and memory components are all
shared by the different processing units.
Parallel architectures are classified by what shared resources each proces-
sor can directly access. One typically distinguishes shared memory, shared
disk, and shared nothing architectures (as depicted in Figure 21.1).
Search WWH ::




Custom Search