Databases Reference
In-Depth Information
The second approach is to group data by the first item in pair and maintain an
associative array (“stripe”) where counters for all adjacent items are accumulated.
Reducer receives all stripes for leading item i, merges them, and emits the same result
as in the pairs approach.
Generates fewer intermediate keys. Hence the framework has less
sorting to do.
Greatly benefits from combiners.
Performs in-memory accumulation. This can lead to problems,
if not properly implemented.
More complex implementation.
In general, “stripes” is faster than “pairs”
Applications:
Text Analysis, Market Analysis
NoSQL Data Modeling Techniques
SQL and the relational model in general were designed to store and manage data
originating from enterprise systems and the main focus was to stay ACID compliant.
While SQL and relational models ensured data integrity and consistency, they also
introduced abstractions modeling end user interactions. This user-oriented nature had
a few implications:
Mostly the end user wanted to see data at aggregated level for
reporting and analysis purpose. Contextualizing the data and
linkages were not possible through standard SQL functions,
hence complex applications needed to be built to bring out the
semantic meaning of data.
Distributed nature of data management operations was never
thought of, while SQL and RDBMS platforms provided excellent
features to manage concurrency, integrity, consistency, or data
type validity, they fail in providing consistency, availability and
fault-tolerance type of features, which was largely left to the
programmer community to custom develop.
To overcome these shortcomings and most importantly to develop solutions to
manage big data scale and variety of data types a new set of “No SQL” (read as Not Only
SQL) data models began to emerge: key-value storage, document databases, and graph
databases.
 
Search WWH ::




Custom Search