Database Reference
In-Depth Information
skills, APIs, and assets across the platform so that the analytics can be applied
to an engine that's optimized for the data at hand. For example, the IBM Big
Data platform lets you take text analytics that are built via its Annotated
Query Language (AQL) and seamlessly deploy them from an at-rest Ha-
doop engine into its Streams Big Data velocity engine. Most of the MapRe-
duce programs that you code in Hadoop can be run in the IBM PureData
System for Analytics; the SQL reports that you generate on IBM pureSys-
tems for Operational Analytics (formerly known as the IBM Smart Analytics
System) can pretty much be deployed without change on DB2 for z/OS.
When you consider where data should be stored, it's best to first under-
stand how data is stored today and what features characterize your persis-
tence options. Data that's stored in a traditional data warehouse goes through
a lot of processing before making it into the warehouse. The data is expected
to be of high quality once it lands in the warehouse, and so it's cleaned up
through enrichment, matching, glossaries, metadata, master data manage-
ment, modeling, and other quality services that are attached to the data before
it's ready for analysis. Obviously, this can be an expensive process, and the
data that lands in a warehouse is viewed as having both high value and
broad purpose: it's going places and will appear in reports and dashboards
where accuracy is key.
In contrast, data in some of the newer Big Data repositories rarely under-
goes (at least initially) such rigorous preprocessing because that would be
cost-prohibitive, and the work in these repositories is subject to more discov-
ery than known value. What's more, each repository has different character-
istics with different tradeoffs. One might prioritize on the application of ACID
(atomicity, consistency, isolation, and durability) properties, and another might
operate in a relaxed consistency state where the BASE properties (basically
available, soft state, and eventually consistent) can be tolerated.
We like to use a gold mining analogy to articulate the opportunity of Big
Data. In the “olden days” (which, for some reason, our kids think is a time
when we were their age), miners could easily spot nuggets or veins of gold,
as they were visible to the naked eye. Let's consider that gold to be “high
value-per-byte data”; you can see its value and therefore you invest resources
to extract it. But there is more gold out there, perhaps in the hills nearby or
miles away; it just isn't visible to the naked eye, and trying to find this hidden
Search WWH ::




Custom Search