What Is Big Data? - Harness the Power of Big Data

Database Reference

In-Depth Information

skills, APIs, and assets across the platform so that the analytics can be applied

to an engine that's optimized for the data at hand. For example, the IBM Big

Data platform lets you take text analytics that are built via its Annotated

Query Language (AQL) and seamlessly deploy them from an at-rest Ha-

doop engine into its Streams Big Data velocity engine. Most of the MapRe-

duce programs that you code in Hadoop can be run in the IBM PureData

System for Analytics; the SQL reports that you generate on IBM pureSys-

tems for Operational Analytics (formerly known as the IBM Smart Analytics

System) can pretty much be deployed without change on DB2 for z/OS.

When you consider where data should be stored, it's best to first under-

stand how data is stored today and what features characterize your persis-

tence options. Data that's stored in a traditional data warehouse goes through

a lot of processing before making it into the warehouse. The data is expected

to be of high quality once it lands in the warehouse, and so it's cleaned up

through enrichment, matching, glossaries, metadata, master data manage-

ment, modeling, and other quality services that are attached to the data before

it's ready for analysis. Obviously, this can be an expensive process, and the

data that lands in a warehouse is viewed as having both high value and

broad purpose: it's going places and will appear in reports and dashboards

where accuracy is key.

In contrast, data in some of the newer Big Data repositories rarely under-

goes (at least initially) such rigorous preprocessing because that would be

cost-prohibitive, and the work in these repositories is subject to more discov-

ery than known value. What's more, each repository has different character-

istics with different tradeoffs. One might prioritize on the application of ACID

(atomicity, consistency, isolation, and durability) properties, and another might

operate in a relaxed consistency state where the BASE properties (basically

available, soft state, and eventually consistent) can be tolerated.

We like to use a gold mining analogy to articulate the opportunity of Big

Data. In the “olden days” (which, for some reason, our kids think is a time

when we were their age), miners could easily spot nuggets or veins of gold,

as they were visible to the naked eye. Let's consider that gold to be “high

value-per-byte data”; you can see its value and therefore you invest resources

to extract it. But there is more gold out there, perhaps in the hills nearby or

miles away; it just isn't visible to the naked eye, and trying to find this hidden

Search WWH ::

Custom Search

Home