Database Reference
In-Depth Information
ecosystem of components that must be combined in a variety of ways to address each applica-
tion's requirements, which can range from general information technology (IT) performance
scalability to detailed performance improvement objectives associated with specific algorithmic
demands. For example, some algorithms expect that massive amounts of data are immediately
available quickly, necessitating large amounts of core memory. Other applications may need
numerous iterative exchanges of data between different computing nodes, which would require
high-speed networks.
The big data technology ecosystem stack may include the following:
1. Scalable storage systems that are used for capturing, manipulating, and analyzing massive
datasets.
2. A computing platform, sometimes configured specifically for large-scale analytics, often
composed of multiple (typically multicore) processing nodes connected via a high-speed
network to memory and disk storage subsystems. These are often referred to as appliances.
3. A data management environment, whose configurations may range from a traditional data-
base management system scaled to massive parallelism to databases configured with alter-
native distributions and layouts to newer graph-based or other NoSQL data management
schemes.
4. An application development framework to simplify the process of developing, executing,
testing, and debugging new application code. This framework should include programming
models, development tools, program execution and scheduling, and system configuration
and management capabilities.
5. Methods of scalable analytics (including statistical and data mining models) that can be
configured by the analysts and other business consumers to help improve the ability to
design and build analytical and predictive models.
6. Management processes and tools that are necessary to ensure alignment with the enterprise
analytics infrastructure and collaboration among the developers, analysts, and other busi-
ness users.
14.10.3 Tools, Techniques, and Technologies of Big Data
14.10.3.1 Big Data Architecture
Analytical environments are deployed in different architectural models. Even on parallel plat-
forms, many databases are built on a shared everything approach in which the persistent storage
and memory components are all shared by the different processing units.
A shared-disk approach may have isolated processors, each with its own memory, but the per-
sistent storage on disk is still shared across the system. These types of architectures are layered on
top of SMP machines. While there may be applications that are suited to this approach, there are
bottlenecks that exist because of the sharing, because all I/O and memory requests are transferred
(and satisfied) over the same bus. As more processors are added, the synchronization and commu-
nication need to increase exponentially, and therefore the bus is less able to handle the increased
need for bandwidth. This means that unless the need for bandwidth is satisfied, there will be limits
to the degree of scalability.
In contrast, in a shared-nothing approach, each processor has its own dedicated disk storage.
This approach, which maps nicely to an MPP architecture, is not only more suitable to discrete
Search WWH ::




Custom Search