Database Reference
In-Depth Information
much time on this part of the environment except to describe the
mechanisms for writing directly to the data-flow component.
Data Flow
Collection, analysis, and reporting systems, with few exceptions, scale and
grow at different rates within an organization. For example, if incoming
traffic remains stable, but depth of analysis grows, then the analysis
infrastructure needs more resources despite the fact that the amount of data
collected stays the same. To allow for this, the infrastructure is separated
into tiers of collection, processing, and so on. Many times, the
communication between these tiers is conducted on an ad hoc basis, with
each application in the environment using its own communication method
to integrate with its other tiers.
One of the aims of a real-time architecture is to unify the environment,
at least to some extent, to allow for the more modular construction of
applications and their analysis. A key part of this is the data-flow system
(also called a data motion system in this topic).
These systems replace the ad hoc, application-specific, communication
framework with a single, unified software system. The replacement software
systems are usually distributed systems, allowing them to expand and
handle complicated situations such as multi-datacenter deployment, but
they expose a common interface to both producers and consumers of the
data.
The systems discussed in this topic are primarily what might be considered
third-generation systems. The “zero-th generation” systems are the closely
coupled ad hoc communication systems used to separate applications into
application-specific tiers.
The first generation systems break this coupling, usually using some sort
of log-file system to collect application-specific data into files. These files
are then generically collected to a central processing location. Custom
processors then consume these files to implement the other tiers. This has
been, by far, the most popular system because it can be made reliable by
implementing“atleastonce”deliverysemanticsandbecauseit'sfastenough
for batch processing applications. The original Hadoop environments were
essentially optimized for this use case.
Search WWH ::




Custom Search