Database Reference
In-Depth Information
chronologically and provide some background on the origin of streaming
data infrastructures. Although this is historically interesting, many of the
tools and frameworks presented were developed to solve problems in these
spaces, and their design reflects some of the challenges unique to the space
in which they were born. Kafka, a data motion tool covered in Chapter 4,
“Flow Management for Streaming Analysis,” for example, was developed as
a web applications tool, whereas Storm, a processing framework covered in
Chapter 5, “Processing Streaming Data,” was developed primarily at Twitter
for handling social media data.
The second section, “Why Streaming Data is Different,” covers three of
the important aspects of streaming data: continuous data delivery, loosely
structured data, and high-cardinality datasets. The first, of course, defines
a system to be a real-time streaming data environment in the first place.
The other two, though not entirely unique, present a unique challenge to
the designer of a streaming data application. All three combine to form the
essential streaming data environment.
The third section, “Infrastructures and Algorithms,” briefly touches on the
significance of how infrastructures and algorithms are used with streaming
data.
Sources of Streaming Data
There are a variety of sources of streaming data. This section introduces
some of the major categories of data. Although there are always more and
more data sources being made available, as well as many proprietary data
sources, the categories discussed in this section are some of the application
areas that have made streaming data interesting. The ordering of the
application areas is primarily chronological, and much of the software
discussed in this topic derives from solving problems in each of these
specific application areas.
The data motion systems presented in this topic got their start handling
data for website analytics and online advertising at places like LinkedIn,
Yahoo!, and Facebook. The processing systems were designed to meet the
challenges of processing social media data from Twitter and social networks
like LinkedIn.
Google, whose business is largely related to online advertising, makes heavy
use of the advanced algorithmic approaches similar to those presented in
Search WWH ::




Custom Search