Database Reference
In-Depth Information
campaigns aim at preventing valuable customers from terminating their relationship with
the organization.
Although potentially effective, this can also lead to a huge waste of resources and to
bombarding and annoying customers with unsolicited communications. Data mining and
classification (propensity) models in particular can support the development of targeted
marketing campaigns. They analyze customer characteristics and recognize the profiles or
extended-profiles of the target customers.
c. Market Basket Analysis : Data mining and association models in particular can be used
to identify related products typically purchased together. These models can be used for
market basket analysis and for revealing bundles of products or services that can be sold
together.
14.10 Big Data Analytics
Along with the data came the problem of how to compute all this volume and variety and how
to handle the volume of the data. This is where Google, Facebook, and Yahoo clearly showed the
way; the former created a new computing model based on a file system and a programming lan-
guage called MapReduce that scaled up the search engine and was able to process multiple queries
simultaneously. In 2002, architects Doug Cutting and Mike Cafarella were working on an open-
source search engine project Nutch, which led to them modeling the underlying architecture based
on the Google model. This led to the development of the Nutch project as a top Apache project
under open source, which was adopted by Yahoo in 2006 and called Hadoop. Hadoop has in the
last few years created a whole slew of companies that are both commercial solutions and commit
back features to the base open-source project, a true collaboration-based software and framework
development.
The other technology that has evolved into a powerful platform is the not only SQL (NoSQL)
movement. The underpinnings of this platform are based on a theorem proposed by Eric Brewer
in 2002 called the CAP theorem. According to the CAP theorem, a database cannot meet all the
rules of ACID compliance at any point in time and yet be scalable and flexible. However, in the
three basic properties of consistency, availability, and partition tolerance, a database can meet two
of the three, thereby creating a scalable and distributed architecture that can evolve into meeting
scalability requirements in a horizontal scaling and provide higher throughput, as the compute
in this environment is very close to the storage and is a distributed architecture that can allow for
multiple consistency levels.
Facebook was one of the earliest evangelists of the NoSQL architecture, as they needed to solve
the scalability and usability demands of a user population that was only third behind China and
India in terms of number of people. The popular NoSQL database Cassandra was developed and
used at Facebook for a long time (now it has been abandoned by Facebook due to greater scal-
ability needs) and is used across many other companies in conjunction with Hadoop and other
traditional RDBMS solutions. It remains a top-level Apache project and is evolving with more
features being added.
The key thing to understand here is the data part of Big Data was always present and used
in a manual fashion, with a lot of human processing and analytic refinement, eventually being
used in a decision-making process. What has changed and created the buzz with Big Data is
the automated data processing capability that is extremely fast and scalable and has flexible
Search WWH ::




Custom Search