Database Reference
In-Depth Information
2012 presidential election in Mexico turned into a Twitter veracity example
with fake accounts, which polluted political discussion, introduced de-
rogatory hash tags, and more. Spam is nothing new to folks in IT, but you
need to be aware that in the Big Data world, there is also Big Spam poten-
tial, and you need a way to sift through it and figure out what data can and
can't be trusted. Of course, there are words that need to be understood in
context, jargon, and more (we cover this in Chapter 8).
As previously noted, embedded within all of this noise are useful signals:
the person who professes a profound disdain for her current smartphone
manufacturer and starts a soliloquy about the need for a new one is express-
ing monetizable intent. Big Data is so vast that quality issues are a reality, and
veracity is what we generally use to refer to this problem domain. The fact
that one in three business leaders don't trust the information that they use to
make decisions is a strong indicator that a good Big Data platform needs to
address veracity.
What About My Data Warehouse
in a Big Data World?
There are pundits who insist that the traditional method of doing analytics is
over. Sometimes these NoSQL (which really means Not Only SQL) pundits
suggest that all warehouses will go the way of the dinosaur—ironic, consid-
ering a lot of focus surrounding NoSQL databases is about bringing SQL inter-
faces to the runtime. Nothing could be further from the truth. We see a number
of purpose-built engines and programming models that are well suited for
certain kinds of analytics. For example, Hadoop's MapReduce programming
model is better suited for some kinds of data than traditional warehouses.
For  this reason, as you will learn in Chapter 3, the IBM Big Data platform
includes a Hadoop engine (and support for other Hadoop engines as well,
such as Cloudera). What's more, IBM recognizes the flexibility of the
programming model, so the IBM PureData System for Analytics (formerly
known as Netezza) can execute MapReduce programs within a database. It's
really important in the Big Data era that you choose a platform that provides
the flexibility of a purpose-built engine that's well suited for the task at hand
(the kind of analytics you are doing, the type of data you are doing it on, and
so on). This platform must also allow you to seamlessly move programming
 
Search WWH ::




Custom Search