Literature review - Reliability Assurance of Big Data in the Cloud

Database Reference

In-Depth Information

hence data transfer is considered to be the main procedure during the data recovery

process. In order to recover Cloud data in a cost-effective fashion, in this section we

focus on data transfer approaches for distributed systems. In addition to data recovery,

data transfer is also intensively involved in creating replicas in the Cloud. Therefore,

reviews conducted in this section could also benefit our research for creating replicas

in the data creation stage.

Data transfer has been considered a very important research issue in the field of

high-performance networks and distributed storage systems for a long time [69,70] . In

recent years, the ever-developing Cloud and large-scale distributed storage technolo-

gies have resulted in higher demand for data transfer from both data transfer speed and

energy consumption aspects. Balancing the trade-off between data transfer speed and

energy consumption is a significant challenge.

On one hand, to meet the requirements of the large-scale data-intensive applica-

tions, the need for high-speed yet predictable data transfer is increasing where net-

works with effective bandwidth controls are required. Because of its fully controlled

feature, dedicated networks with bandwidth reservation have drawn more and more

attention. Typical examples of dedicated networks include research networks such

as the National LambdaRail [71] and the Internet2 Network [72] . In one study [73] ,

a bandwidth reservation approach via a centralized resource management platform

was proposed for providing predictable performance in research networks. The cen-

tralized management pattern has, however, limited scalability and hence constrains

the applicability of this approach. In another study [74] , a distributed bandwidth

reservation approach for reducing energy consumption in dedicated networks was

proposed, which could greatly improve the scalability issue compared to the former

proposal [73] .

On the other hand, the energy consumption for high-speed large-scale data trans-

fer is high. This has become one of the major factors that need to be considered in

large-scale storage systems. In recent years, many efforts have been made to reduce

the energy consumption incurred in large-scale data transfer. For example, a standard

was developed for defining management parameters and protocols in energy-efficient

Ethernet networks [75] . Energy consumption models [74,76] have been proposed for

switches and general network devices respectively. To reduce the energy consumption

over network links, several approaches are offered. In one approach [4] , a replica cre-

ation and recovery strategy was proposed where data transfer would be conducted with

a constant minimum speed to maintain a certain number of replicas. In other proposals

[77,78] , energy management approaches, referred to as “shutdown approaches,” are

offered. In these approaches, devices on the link are shut down when network traffic is

too low so that the energy consumption of routers and network links can be reduced.

Specifically, in one model [78] , the shutdown approach is conducted in such a way

that data are transmitted as fast as possible and the data transfer link is “idled” after

data transfer is finished. However, there could be problems for such approaches as

some other tasks might also use the same data transfer link meaning it cannot be shut

down. Different from the shutdown approaches that shut down devices to save power,

a phenomenon was observed that less energy was consumed by network devices when

operating at lower link rates [79] . One study [74] found that the power of a network

Reliability Assurance of Big Data in the Cloud

Search WWH ::

Custom Search

Home