Databases Reference
In-Depth Information
Business challenges
EMS has built its business on the effective collection, analysis, and use of data. As Jeff Hassemer,
vice president of product strategy for EMS, explained,
Experian has handled large amounts of data for a very long time: who consumers are, how they're
connected, how they interact. We've done this over billions and quadrillions of records over time.
But with the proliferation of channels and information that are now flowing into client organiza-
tions—social media likes, web interactions, email responses—that data has gotten so large that
it's maxed the capacity of older systems. We needed to leap forward in our processing ability. We
wanted to process data orders of magnitude faster so we could react to tomorrow's consumer.
In the past, it was normal to send customer database updates to clients once monthly for campaign
adjustments, allowing Experian to process large volumes of data through a number of diverse platforms,
mostly mainframe based. “We weren't required to provide data in real time. We weren't required to provide
the level of volume in terms of the growth rates we've seen from our storage and our data. It's been a total
paradigm shift that compelled us to look at other solutions,” explained Emad Georgy, CTO for EMS.
Today's consumers leave a digital trail of behaviors and preferences for marketers to leverage so
they can enhance the customer experience. Experian's clients have started asking for more frequent
updates on consumers' latest purchasing behaviors, online browsing patterns, and social media activ-
ity so they can respond in real time. “We serve many of the top retail companies in the world, and
they're increasingly looking for a single, integrated view of their customer,” noted Georgy. “If a cus-
tomer is walking into a store in Burlington, MA, is that same customer now liking the company on
Facebook? Are they tweeting? We're looking for an integrated view of who that person is so we can
determine how to message them in the right way.”
But the data exhaust from these digital channels is massive and requires a technological infrastructure
that can accommodate rapid processing, large-scale storage, and flexible analysis of multistructured data.
Experian's mainframes were hitting the tipping point in terms of performance, flexibility, and scalability.
Given the need for immediacy of information and customization of data in real time for clients, EMS
set an internal goal to process more than 100 million records of data per hour. That translates to 28,000
records per second.
“Instead of trying to fit a square peg in a round hole, we went out and decided to look for new
architectures that can handle the new volumes of data that we manage,” said Joe McCullough, IT
business analyst at EMS. The team identified about 30 criteria for the new platform, ranging from
depth and breadth of offering, to support capabilities, to price, to unique distribution features. They
prioritized two criteria above the rest:
Both batch and real-time data processing capabilities.
Scalability to accommodate large and growing data volumes.
“We compared Hadoop as well as HBase to a number of other options in the industry,” said Georgy.
“The North America Experian Marketing Services group has organically led the evaluation of NoSQL
technologies within Experian.” Hadoop and HBase quickly surfaced as a natural fit for Experian's needs.
EMS engineers downloaded raw Apache Hadoop, but quickly saw the gaps that could be filled by a com-
mercial distribution.
EMS critiqued several distributions and “found that, by and far, Cloudera was in the lead. We
went with Cloudera for a number of reasons, primarily being the strength of the distribution and the
features that CDH gives us,” noted Emad Georgy.
Search WWH ::




Custom Search