Data Modeling Approaches for Big Data and Analytics Solutions - Big Data Imperatives

Databases Reference

In-Depth Information

Solution: Solution is absolutely straightforward: mapper takes records one by one

and emits accepted items or their transformed versions.

Applications:

Log Analysis, Data Querying, ETL, Data Validation

Distributed Task Execution

Problem Statement: There is a large computational problem that can be divided into

multiple parts and results from all parts can be combined together to obtain a final result.

Solution: Problem description is split into a set of specifications, and specifications

are stored as input data for mappers. Each mapper takes a specification, performs

corresponding computations, and emits results. Reducer combines all emitted parts into

the final result.

Applications:

Physical and Engineering Simulations, Numerical Analysis,

Performance Testing

Sorting

Problem Statement: There is a set of records, and it is required to sort these records by

some rule or process these records in a certain order.

Solution: Simple sorting is absolutely straightforward: mappers just emit all items

as values associated with the sorting keys that are assembled as function of items.

Nevertheless, in practice sorting is often used in a tricky way, that's why it is said to be

the heart of map-reduce (and Hadoop). In particular, it is very common to use composite

keys to achieve secondary sorting and grouping. Sorting in map-reduce is originally

intended for sorting of the emitted key-value pairs by key, but there exist techniques that

leverage Hadoop implementation specifics to achieve sorting by values.

It is worth noting that if map-reduce is used for sorting of the original (not

intermediate) data, it is often a good idea to continuously maintain data in sorted state

using BigTable concepts. In other words, it can be more efficient to sort data once during

insertion than sort them for each map-reduce query.

Applications:

ETL, Data Analysis

Advanced Map-Reduce Patterns

Iterative Message Passing (Graph Processing)

Problem Statement: There is a network of entities and relationships between them. It is

required to calculate a state of each entity on the basis of properties of the other entities in

its neighborhood. This state can represent a distance to other nodes, indication that there is

a neighbor with the certain properties, characteristic of neighborhood density and so on.

Solution: A network is stored as a set of nodes, and each node contains a list of

adjacent node IDs. Conceptually, map-reduce jobs are performed in iterative way, and

at each iteration each node sends messages to its neighbors. Each neighbor updates its

Search WWH ::

Custom Search

Home