Information Technology Reference
In-Depth Information
Fig. 1. Directed acyclic graph (DAG) workflow example with 5 tasks. Edges represent
control-flow dependencies.
price. Providers offer several types of instances, which present different cost-
performances (i.e. computing power/price).
Some providers also permit the acquisition of instances under a different scheme
in which prices change over time. Amazon's Elastic Compute Cloud (EC2) spot
instances [3] are a strategy for selling idle computing capacity using a dynamic
market-driven pricing scheme, in other words based on the law of supply and de-
mand. These dynamically changing prices are generally significantly lower than
the fixed price of on-demand instances.
To acquire a set of spot instances, the user must bid for the price that is willing
to pay. If the user's bid is greater than the current spot price, the requested
instances are provided. If in any moment the spot price overcomes the user's
bid, the instances are terminated without previous notice. This situation is called
an out-of-bid error. As can be perceived, this scheme of computation supposes
a trade-off between the cost of each instance and its reliability. To face such
issue many strategies to select the proper bid have been proposed. Most of them
rely on the use of historical spot prices [1,15]. These kind of strategies for price
prediction has been applied in many contexts [14,16].
When running a scientific workflow application, deciding the number and type
of instances to acquire becomes a particularly complex problem. In first place,
because the unbalance of task durations and the existent dependencies generate
variable computation requirements during the execution of the application [7]. In
second place, because it may be dicult to accurately predict the performance
of tasks. On the one hand because experimental applications usually explore
different sets of data and parameters, which may hinder the proper performance
modeling of tasks. On the other hand because performance variability in the
cloud is inevitable [12,6].
These two factors make very hard to know in advance the necessary amount
of instances. For such reason, autoscaling mechanisms [9] emerged to ( i )au-
tomatically determine the number and type of instances to acquire, while ( ii )
scheduling the workflow tasks onto the acquired instances. As autoscaling is a
two-fold problem with circular dependencies, the mechanisms operate during the
entire execution of an application dynamically resizing the computing infrastruc-
ture, scheduling and executing the tasks.
Search WWH ::




Custom Search