Databases Reference
In-Depth Information
Each of these steps bring a workload characteristic:
Discovery will mandate interrogation of data by users. The data will need to be processed
where it is and not moved across the network. The reason for this is due to the size and
complexity of the data itself, and this requirement is a design goal for Big Data architecture.
Compute and process data at the storage layer.
Analysis will mandate parsing of data with data visualization tools. This will require minimal
transformation and movement of data across the network.
Analytics will require converting the data to a structured format and extracting for processing
to the data warehouse or analytical engines.
Big Data workloads are drastically different from the traditional workloads due to the fact that no
database is involved in the processing of Big Data. This removes a large scalability constraint but
adds more complexity to maintain file system-driven consistency. Another key factor to remember
is there is no transaction processing but rather data processing involved with processing Big Data.
These factors are the design considerations when building a Big Data system, which we will
discuss in Chapters 10 and 11.
Big Data workloads from an analytical perspective will be very similar to adding new data to the
data warehouse. The key difference here is the tables that will be added are of the narrow/narrow
type, but the impact on the analytical model can be that of a wide/narrow table that will become
wide/wide.
Big Data query workloads are more program execution of MapReduce code, which is completely
opposite of executing SQL and optimizing for SQL performance.
The major difference in Big Data workload management is the impact of tuning the data pro-
cessing bottlenecks results in linear scalability and instant outcomes, as opposed to the traditional
RDBMS world of data management. This is due to the file-based processing of data, the self-con-
tained nature of the data, and the maturity of the algorithms on the infrastructure itself.
Technology choices
As we look back and think about how to design the next generation of data warehouses with the
concept of a workload-driven architecture, there are several technologies that have come into being
in the last decade, and these technologies are critical to consider for the new architecture. A key
aspect to remember is the concept of data warehousing is not changing but the deployment and the
architecture of the data warehouse will evolve from being tightly coupled into the database and
its infrastructure to being distributed across different layers of infrastructure and data architecture.
The goal of building the workload-driven architecture is to leverage all the technology improve-
ments into the flexibility and scalability of the data warehouse and Big Data processing, thereby
creating a coexistence platform leveraging all current-state and future-state investments to better
ROI. Another viewpoint to think about is that by design Big Data processing is built around proce-
dural processing (more akin to programming language-driven processing), which can take advan-
tage of multicore CPU and SSD or DRAM technologies to the fullest extent, as opposed to the
RDBMS architecture where large cycles of processing and memory are left underutilized.
 
Search WWH ::




Custom Search