Databases Reference
In-Depth Information
configuration can provide the administrators with tools and tips to zone the infrastructure to
mark the data in its own area, minimizing both risk and performance impact.
Data exploration and mining is a very common activity that is a driver for Big Data acquisition
across organizations, and also produces large data sets as output of processing. These data sets
need to be maintained in the Big Data system by periodically sweeping and deleting intermediate
data sets. This is an area that normally is ignored by organizations and can be a performance drain
over a period of time.
Storage performance
Disk performance is an important consideration when building Big Data systems and the
appliance model can provide a better focus on the storage class and tiering architecture. This
will provide the starting kit for longer-term planning and growth management of the storage
infrastructure.
If a combination of in-memory, SSD, and traditional storage architecture is planned for Big Data
processing, the persistence and exchange of data across the different layers can be consuming both
processing time and cycles. Care needs to be extended in this area, and the appliance architecture
provides a reference for such complex storage requirements.
Operational costs
Calculating the operational cost for a data warehouse and its Big Data platform is a complex task that
includes initial acquisition costs for infrastructure, plus labor costs for implementing the architecture,
plus infrastructure and labor costs for ongoing maintenance including external help commissioned
from consultants and experts.
External data integration
Figure 10.6 shows the external data integration approach to creating the next-generation data ware-
house. In this approach the existing data processing and data warehouse platforms are retained, and a
new platform for processing Big Data is created in new technology architecture. A data bus is devel-
oped using metadata and semantic technologies, which will create a data integration environment for
data exploration and processing.
Workload processing is clearly divided in this architecture into processing Big Data in its infrastruc-
ture and the current-state data warehouse in its infrastructure. The streamlining of workload helps main-
tain performance and data quality, but the complexity increases in the data bus architecture, which can
be a simple layer or an overwhelmingly complex layer of processing. This is a custom-built solution for
each system that will be integrated into the data warehouse architecture, and needs a lot of data archi-
tecture skills and maintenance. Data processing of Big Data will be outside the RDBMS platform, and
provides opportunities to create unlimited scalability at a lower price point.
Pros:
Scalable design for RDBMS and Big Data processing.
Reduced overload on processing.
Complexity of processing can be isolated across data acquisition, data cleansing, data
discovery, and data integration.
Search WWH ::




Custom Search