Information Technology Reference
In-Depth Information
BIGS endorses a data partition iterative computing model (such as described in
Section 2.1) through which workers can exploit locality aware computation if so
desired by the user. Through this model, distributed data analysis jobs are struc-
tured into a repeatable sequence of INIT-STATE, MAP and AGGREGATE-
STATE operations as shown in Figure 1.
Fig. 1. BIGS algorithm model
All operations receive a State Object as input and they may produce another
State Object as output. Any operation cannot start until the preceding ones have
finished according to the job dependency graph in Figure 1, which is stored in the
reference NoSQL database. Input datasets to be processed are split into a user
definable number of partitions, and there is one MAP operation per partition.
Each MAP operation can loop over the elements of the partition it is processing
and may produce elements for an output dataset. For instance, an image feature
extraction MAP operation would produce one or more feature vectors for each
input image. Workers take over operations by inspecting the job dependency
graph stored in the reference database. Developers program their algorithms by
providing implementations for the Java Process API methods. When a BIGS
worker takes over an operation, it creates the appropriate programming context
and makes the corresponding data available (through data caches) to the imple-
mentation being invoked. As it can be seen in Figure 2, AGGREGATE-STATE
operations use all output states of the preceding MAP operations to create the
resulting state of iteration or the whole process.
2.2 Data Access and Caching Strategies
BIGS takes advantage of data parallelism approach. Given a large amount of
data that can be processed independently, BIGS can distribute them in data
 
Search WWH ::




Custom Search