Big Data, Analytics, and Data Science Life Cycle - Getting Started with Greenplum for Big Data Analytics

Database Reference

In-Depth Information

videos, mails, tweets, and so on). These formats are not supported by any of

the traditional datamarts, data store/data mining applications today.

Note

Noisy data refers to the reduced degree of relevance of data in context.

It is the meaningless data that just adds to the need for higher storage

space and can adversely affect the result of data analysis. More noise in

data could mean more unnecessary/redundant/un-interpretable data.

• Traditionally, business/enterprise data used to be consumed in batches, in

specific windows and subject to processing. With the recent innovation in ad-

vanced devices and the invasion of interconnect, data is now available in real

time and the need for processing insights in real time has become a prime

expectation.

• With all the above comes a need for processing efficiency. The processing

windows are getting shorter than ever. A simple parallel processing frame-

work like MapReduce has attempted to address this need.

Note

In Big Data, handling volumes isn't a critical problem to solve; it is the

complexity involved in dealing with heterogeneous data that includes a

high degree of noise.

So, what is Big Data?

With all that we tried understanding previously; let's now define Big Data.

Big Data can be defined as an environment comprising of tools, processes, and pro-

cedures that fosters discovery with data at its center. This discovery process refers

to our ability to derive business value from data and includes collecting, manipulat-

ing, analyzing, and managing data.

Search WWH ::

Custom Search

Home