Database Reference
In-Depth Information
Summary
Even in small organizations, important data can be found in various formats, man-
aged by a variety of applications, and stored on different machines or in the cloud. The
result, a siloing of data, makes it difficult to ask questions that require a variety of data
sources to find the answer.
An organization's data often becomes stored in many disparate places for practical
reasons. Operational data stores may be optimized for the constant transactional duties
of customer-facing applications. Meanwhile, tasks involving aggregate queries and data
interoperability are best suited for specialized analytics systems. Analysts may be most
productive using tools such as spreadsheets. Compliance, security, and user privacy are
all valid reasons for keeping data in separate locations, accessible only to specific peo-
ple. Much of an organization's useful data lives in unstructured formats such as email,
user comments, or social media posts.
There are many conceptual approaches to solving the data challenges created by
silos. Data warehousing refers to the strategy of structuring and storing data generated
by operational databases into a central repository. Since data from a variety of sources
may not have the same structure, data warehousing challenges often require a process
for data extraction, transformation, and loading (ETL).
The Internet, mobile applications, social media, email, and other communication
technologies have all led to an increase of data sources that an organization may want
to collect and analyze. Another approach is to ask questions about disparate datasets
using technology that is designed for distributed processing, such as MapReduce.
Although these contrasting philosophies have strengths and weaknesses, the real
solutions to data silo problems are organizational rather than technological. Under-
standing the types of data challenges you face before looking at technology is a key
first step. In some cases, it's possible that data can be queried without a data warehous-
ing solution at all, enabling users to skip a traditional ETL process. When investing in
technology, it often makes sense for organizations to concentrate on technology that
bridges different sources of data rather than technology that merges and moves data
into a central repository.
Many projects are combining the large-scale processing of distributed systems such
as Hadoop (to improve data processing) with more traditional analytic database or
data warehousing technology, using those techniques to speed up the queries. With
so much data already “in the cloud,” there are unique advantages to using data appli-
cations as a service. Visualization and reporting tools are also being built to connect
directly to data sources besides those of the traditional data warehouse.
 
 
Search WWH ::




Custom Search