Databases Reference
In-Depth Information
Terminology: Big Data
We've been throwing around “Big Data” quite a lot already and are
guilty of barely defining it beyond raising some big questions in the
previous chapter.
A few ways to think about Big Data:
“Big” is a moving target . Constructing a threshold for Big Data such
as 1 petabyte is meaningless because it makes it sound absolute. Only
when the size becomes a challenge is it worth referring to it as “Big.”
So it's a relative term referring to when the size of the data outstrips
the state-of-the-art current computational solutions (in terms of
memory, storage, complexity, and processing speed) available to han‐
dle it. So in the 1970s this meant something different than it does
today.
“Big” is when you can't fit it on one machine . Different individuals
and companies have different computational resources available to
them, so for a single scientist data is big if she can't fit it on one machine
because she has to learn a whole new host of tools and methods once
that happens.
Big Data is a cultural phenomenon . It describes how much data is
part of our lives, precipitated by accelerated advances in technology.
The 4 Vs: Volume, variety, velocity, and value. Many people are cir‐
culating this as a way to characterize Big Data. Take from it what you
will.
Big Data Can Mean Big Assumptions
In Chapter 1 , we mentioned the Cukier and Mayer-Schoenberger ar‐
ticle “The Rise of Big Data.” In it, they argue that the Big Data revo‐
lution consists of three things:
• Collecting and using a lot of data rather than small samples
• Accepting messiness in your data
• Giving up on knowing the causes
They describe these steps in a rather grand fashion by claiming that
Big Data doesn't need to understand cause given that the data is so
enormous. It doesn't need to worry about sampling error because it is
 
Search WWH ::




Custom Search