Database Reference
In-Depth Information
allocation and distribution of the data; it enables more effective parallelization and consequently
does not introduce the same kind of bus bottlenecks from which the SMP/shared-memory and
shared-disk approaches suffer. Most big data appliances use a collection of computing resources,
typically a combination of processing nodes and storage nodes.
14.10.3.2 Row- versus Column-Oriented Data Layouts
Most traditional database systems employ a row-oriented layout, in which all the values associated
with a specific row are laid out consecutively in memory. That layout may work well for transaction
processing applications that focus on updating specific records associated with a limited number
of transactions (or transaction steps) at a time. These are manifested as algorithmic scans that are
performed using multiway joins; accessing whole rows at a time when only the values of a smaller
set of columns are needed may flood the network with extraneous data that are not immediately
needed and ultimately will increase the execution time.
Big data analytics applications scan, aggregate, and summarize over massive datasets. Analytical
applications and queries will only need to access the data elements needed to satisfy join condi-
tions. With row-oriented layouts, the entire record must be read in order to access the required
attributes, with significantly more data read than is needed to satisfy the request. Also, the row-
oriented layout is often misaligned with the characteristics of the different types of memory sys-
tems (core, cache, disk, etc.), leading to increased access latencies. Consequently, row-oriented
data layouts will not enable the types of joins or aggregations typical of analytic queries to execute
with the anticipated level of performance.
Hence, a number of appliances for big data use a database management system that uses an
alternate, columnar layout for data that can help to reduce the negative performance impacts of
data latency that plague databases with a row-oriented data layout. The values for each column
can be stored separately, and because of this, for any query, the system is able to selectively access
the specific column values requested to evaluate the join conditions. Instead of requiring separate
indexes to tune queries, the data values themselves within each column form the index. This
speeds up data access while reducing the overall database footprint while dramatically improving
query performance. The simplicity of the columnar approach provides many benefits, especially
for those seeking a high-performance environment to meet the growing needs of extremely large
analytic datasets.
14.10.3.3 NoSQL Data Management
NoSQL or “Not only SQL,” suggests environments that combine traditional SQL (or SQL-like
query languages) with alternative means of querying and access. NoSQL data systems hold
out the promise of greater flexibility in database management while reducing the dependence
on more formal database administration. NoSQL databases have more relaxed modeling con-
straints, which may benefit both the application developer and the end-user analysts when their
interactive analyses are not throttled by the need to cast each query in terms of a relational
table-based environment.
Different NoSQL frameworks are optimized for different types of analyses. For example, some
are implemented as key/value stores, which nicely align to certain big data programming models,
while another emerging model is a graph database, in which a graph abstraction is implemented
to embed both semantics and connectivity within its structure. In fact, the general concepts for
Search WWH ::




Custom Search