Database Reference
In-Depth Information
just anyone in your company should be able to peruse the private financial or medical
records of customers. What it does mean is that the people in your organization are
given access to tools that help them find out the answers to questions quickly and are
expected to back up their ideas with metrics when appropriate. It also means sharing
organizational data freely to help inform, inspire, and empower employees to come up
with their own innovative ideas for solving problems.
Invest in Technology That Bridges Data Silos
Embracing the concept that data silos are actually beneficial allows system architects to
rethink their approaches. If the data warehouse is seen less as a be-all and end-all solu-
tion to data analysis, and distributed computing tools such as Hadoop are seen as use-
ful for processing tasks, this allows administrators to focus on investing in technologies
that bridge the gaps between these systems.
Visualization tools, such as Tableau and QlikView, are beginning to provide access
not only to traditional relational database systems through traditional ODBC drivers
but also to new data tools such as Google's BigQuery and Cloudera's Impala. Similarly,
users of business productivity tools, such as Microsoft Excel, should be able to run
queries by using connectors to underlying data warehousing software.
Convergence: The End of the Data Silo
Many of the technologies and jargon of traditional data warehousing were developed
before the wide-scale adoption of the Internet. Large, single-machine data warehouse
appliances are common in the enterprise market and often bring with them large price
tags and expensive support contracts. Building a distributed data processing system on
a cluster of machines using open-source technologies such as Hadoop provides a differ-
ent type of challenge, requiring expertise and infrastructure maintenance. Essentially,
dealing with either of these system designs requires specialized training and various
trade-offs.
In practice, data warehousing and distributed computing technologies such as
Hadoop have overlapping use cases. For example, creating a MapReduce workf low
can often be a more performant way to solve a complicated ETL transformation step
when moving data from a customer database to a data warehouse. There's been a grad-
ual movement, from both commercial and open-source projects, towards combining
aspects of popular distributed data projects with features found in data warehouses and
analytical databases. For example, the Spark project, an open-source distributed com-
puting system, is designed to be a very fast in-memory analytics platform. One of the
most interesting projects built with Spark is Shark, a data warehouse application that
is compatible with Hadoop's Hive. As a result of this combination, Spark and Shark
together provide both a warehousing capability and fast analytics capabilities. Tradi-
tional data warehousing products are also getting into the act, with industry stalwarts
such as Oracle and SAP incorporating Hadoop into their offerings.
 
 
Search WWH ::




Custom Search