ChunkSim - Data Warehousing Design and Advanced Engineering Applications - page 144

Database Reference

In-Depth Information

Table 1. Extract of analysis report for replication degree

Time

PL-H (LP)

Replication Degree

Time FM (LP)

Time (LP)

Time Query

10%

240

153.552276

239.607907

247.232275

25%

240

153.552276

238.044278

247.362589

50%

240

153.552276

232.592335

247.362589

75%

240

153.552276

219.842612

247.232275

100%

240

153.552276

207.945652

244.492903

150%

240

153.552276

198.712255

233.492903

200%

240

153.552276

185.513946

223.753531

300%

240

153.552276

179.151584

212.753531

400%

240

153.552276

175.329786

207.623217

500%

240

153.552276

173.505943

207.623217

that every chunk has a copy in the system. These

results allow the user to see what performance

benefit there may be from adding redundancy and

also where the choice lies in the interval between

the worst and best PL-H and FM times.

System Size Planning (SSZP) - this experi-

ment answers the question of how the performance

varies with different system sizes under specific

placement and replication scenario, which can

be used to determine the number of nodes for a

dataset.

Additional inputs: replication Degrees array

(optionally, also an array listing the set of

nodes that is offline, in case AA is used to

characterize the behavior in a particular

scenario in terms of which nodes are of-

fline or fail);

Outputs: a set of tuples (Number Of Nodes

Offline, Replication Degree, FailureRate,

Time (LP), Time Query). The “Time (LP)”

and “Time Query” fields are the average

expected runtime of the Local Processing

part and of the whole Query, respective-

ly. The “Number Of Nodes Offline” and

“Replication Degree” are self-explanatory.

The “FailureRate” parameter is a number

between 0 and 1 and represents the fraction

of runs that could not end because a chunk

was not present in any of the active nodes

(the set of nodes remaining after taking

nodes offline). Given that nodes are het-

erogeneous, may have heterogeneous data

loads, and different nodes may fail, it is

possible for a run to succeed (find all nec-

essary chunks) while another one may fail

(not being able to run for lack of at least

one necessary chunk).

Additional inputs: Number of Nodes array;

Outputs: a set of tuples (Number of Nodes, Time

(LP), Time Query). The “Time (LP)” and

“Time Query” fields are the average ex-

pected runtime of the Local Processing part

and of the whole Query, respectively. The

Number of Nodes array specifies which al-

ternatives should be tested.

Availability Analysis (AA) - this experiment

answers the question of whether and how fast the

system is able to provide answers for varying

replication degrees when different sets of nodes

are offline or fail.

Next Page

Data Warehousing Design and Advanced Engineering Applications

Search WWH ::

Custom Search

Home