Supporting and Enhancing SAP CRM - Implementing SAP CRM

Database Reference

In-Depth Information

allocation and distribution of the data; it enables more effective parallelization and consequently

does not introduce the same kind of bus bottlenecks from which the SMP/shared-memory and

shared-disk approaches suffer. Most big data appliances use a collection of computing resources,

typically a combination of processing nodes and storage nodes.

14.10.3.2 Row- versus Column-Oriented Data Layouts

Most traditional database systems employ a row-oriented layout, in which all the values associated

with a specific row are laid out consecutively in memory. That layout may work well for transaction

processing applications that focus on updating specific records associated with a limited number

of transactions (or transaction steps) at a time. These are manifested as algorithmic scans that are

performed using multiway joins; accessing whole rows at a time when only the values of a smaller

set of columns are needed may flood the network with extraneous data that are not immediately

needed and ultimately will increase the execution time.

Big data analytics applications scan, aggregate, and summarize over massive datasets. Analytical

applications and queries will only need to access the data elements needed to satisfy join condi-

tions. With row-oriented layouts, the entire record must be read in order to access the required

attributes, with significantly more data read than is needed to satisfy the request. Also, the row-

oriented layout is often misaligned with the characteristics of the different types of memory sys-

tems (core, cache, disk, etc.), leading to increased access latencies. Consequently, row-oriented

data layouts will not enable the types of joins or aggregations typical of analytic queries to execute

with the anticipated level of performance.

Hence, a number of appliances for big data use a database management system that uses an

alternate, columnar layout for data that can help to reduce the negative performance impacts of

data latency that plague databases with a row-oriented data layout. The values for each column

can be stored separately, and because of this, for any query, the system is able to selectively access

the specific column values requested to evaluate the join conditions. Instead of requiring separate

indexes to tune queries, the data values themselves within each column form the index. This

speeds up data access while reducing the overall database footprint while dramatically improving

query performance. The simplicity of the columnar approach provides many benefits, especially

for those seeking a high-performance environment to meet the growing needs of extremely large

analytic datasets.

14.10.3.3 NoSQL Data Management

NoSQL or “Not only SQL,” suggests environments that combine traditional SQL (or SQL-like

query languages) with alternative means of querying and access. NoSQL data systems hold

out the promise of greater flexibility in database management while reducing the dependence

on more formal database administration. NoSQL databases have more relaxed modeling con-

straints, which may benefit both the application developer and the end-user analysts when their

interactive analyses are not throttled by the need to cast each query in terms of a relational

table-based environment.

Different NoSQL frameworks are optimized for different types of analyses. For example, some

are implemented as key/value stores, which nicely align to certain big data programming models,

while another emerging model is a graph database, in which a graph abstraction is implemented

to embed both semantics and connectivity within its structure. In fact, the general concepts for

Search WWH ::

Custom Search

Home