Customer Case Studies - Data Warehousing in the Age of Big Data

Databases Reference

In-Depth Information

at its core, Hadoop. Nokia has over 100 terabytes (TB) of structured data on Teradata and petabytes

(PB) of multistructured data on the HDFS. The centralized Hadoop cluster that lies at the heart of

Nokia's infrastructure contains 0.5 PB of data. Nokia's data warehouses and datamarts continuously

stream multistructured data into a multitenant Hadoop environment, allowing the company's 60,000+

employees to access the data. Nokia runs hundreds of thousands of Scribe processes each day to effi-

ciently move data from, for example, servers in Singapore to a Hadoop cluster in the U.K. data center.

The company uses Sqoop to move data from HDFS to Oracle and/or Teradata. And Nokia serves data

out of Hadoop through HBase.

Business challenges

Prior to deploying Hadoop, numerous groups within Nokia were building application silos to accom-

modate their individual needs. It didn't take long before the company realized it could derive greater

value from its collective data sets if these application silos could be integrated, enabling all globally

captured data to be cross-referenced for a single, comprehensive version of truth. “We were invento-

rying all of our applications and data sets,” O'Connor noted. “Our goal was to end up with a single

data asset.”

Nokia wanted to understand at a holistic level how people interact with different applications

around the world, which required them to implement an infrastructure that could support daily, tera-

byte-scale streams of unstructured data from phones in use, services, log files, and other sources.

Leveraging this data also requires complex processing and computation to be consumable and

useful for a variety of uses, like gleaning market insights, or understanding collective behaviors of

groups; some aggregations of that data also need to be easily migrated to more structured environ-

ments to leverage specific analytic tools.

However, capturing petabyte-scale data using a relational database was cost prohibitive and would

limit the data types that could be ingested. “We knew we'd break the bank trying to capture all this

unstructured data in a structured environment,” O'Connor said. Because Hadoop uses industry-

standard hardware, the cost per terabyte of storage is, on average, ten times cheaper than a traditional

relational data warehouse system. Additionally, unstructured data must be reformatted to fit into a

relational schema before it can be loaded into the system. This requires an extra data processing step

that slows ingestion, creates latency, and eliminates elements of the data that could become important

down the road.

Various groups of engineers at Nokia had already begun experimenting with Apache Hadoop, and

a few were using Cloudera's distribution including Apache Hadoop (CDH). The benefits of Hadoop

were clear—it offers reliable, cost-effective data storage and high-performance parallel processing of

multistructured data at the petabyte scale—however, the rapidly evolving platform and tools designed

to support and enable it are complex and can be difficult to deploy in production. CDH simplifies this

process, bundling the most popular open-source projects in the Apache Hadoop stack into a single,

integrated package with steady and reliable releases.

After experimenting with CDH for several months, the company decided to standardize the use of

the Hadoop platform to be the cornerstone of its technology ecosystem. With limited Hadoop exper-

tise in-house, Nokia turned to Cloudera to augment their internal engineering team with strategic

technical support and global training services, giving them the confidence with expertise necessary to

deploy a very large production Hadoop environment in a short timeframe.

Search WWH ::

Custom Search

Home