Information Technology Reference
In-Depth Information
partitions across workers in order to process them separately and then aggregate
their results. A critical step in this parallelization strategy is the way that each
worker accesses to the data partitions. The most conventional way to deal with
this issue is by implementing caching systems.
BIGS implements two levels of caching, both of them implemented using the
Java Caching System library. The memory cache stores Java objects in the main
memory of the Java Virtual Machine indexed by an unique ID. The local cache
stores information in a raw database in the local hard drive of the worker. Both
caching systems allows faster access than performing a request from the worker
directly to the HBASE server. Notice that the memory cache allows the fastest
access because avoids reading hard drive as well as the parsing from the raw
data to the Java object. On the other hand, the local cache allows greater storage
capacity. This allowed us to implemented 2 strategies for deciding what partition
a worker takes: any-partition and only-local-partitions . This combination
of a two level caching architecture (disk and memory) together with these caching
strategies constitutes the main contribution of this work.
In the any-partition strategy the worker reads the next execution unit in the
schedule, if the data partition required by such execution unit is in the memory
cache, then the worker processes it at once, otherwise the worker requests the data
partition to the HBASE server, stores it in the memory cache and processes it.
The only-local-partitions strategy requires that one or more copies of the
dataset be distributed through data partitions over all the active workers before
the job be submitted. Those partitions are stored in the local cache. A worker
will process only execution units for which it has their data partitions previously
loaded in its local cache. The worker will also keep its memory cache enable.
2.3 Multiclass Logistic Regression
Logistic regression (LR) is a popular supervised machine learning algorithm
and, despite its name, it is really applied to learn classification models, not to
regression models. LR is probabilistic in nature, it learns a model which estimates
the class posterior probability, P ( C
| x ),where x is a vector of input features and
C is a random variable representing the class to be predicted. If C is assumed
to have a multinomial distribution the posterior probability is calculated as:
exp [ w i x + w i 0 ]
j =1 exp [ w j x + w j 0 ]
y i = P ( C i |
x )=
, i=1..K ,
n is a vector with the input features
where K is the number of classes, x
R
( n +1) is the set of parameters to be learned. Observe that W
is a matrix and for each class i contains a row vector w i R
K
×
and W
R
n and a bias
parameter w i 0 . The learning algorithm works by finding the set of parameters,
W , that minimizes a loss function L ( W, X ), equivalent to the negative likelihood
over the training data set ( X ). This is usually done by using a gradient descent
strategy which iteratively updates W by adding a vector,
W , which points in
the opposite as the gradient of the loss function,
L ( W, X )
Search WWH ::




Custom Search