STATISTICAL TECHNIQUES IN SOFTWARE SIX SIGMA AND DESIGN FOR SIX SIGMA (DFSS) - Software Design for Six-Sigma: A Roadmap for Excellence

Information Technology Reference

In-Depth Information

TABLE 6.3

Examples of Parameters and Statistics

Measure

Parameter

Statistics

Mean

µ

X

Standard deviation

s

Proportion

p

Correlation

ρ

r

interpreting data, displaying data, and making decisions based on data. The term

“statistic” refers to the numerical quantity calculated from a sample of size n . Such

statistics are used for parameter estimation.

In analyzing outputs, it also is essential to distinguish between statistics and pa-

rameters. Although statistics are measured from data samples of limited size ( n ),

a parameter is a numerical quantity that measures some aspect of the data popula-

tion. Population consists of an entire set of objects, observations, or scores that have

something in common. The distribution of a population can be described by several

parameters such as the mean and the standard deviation. Estimates of these param-

eters taken from a sample are called statistics. A sample is, therefore, a subset of a

population. As it usually is impractical to test every member of a population (e.g.,

100% execution of all feasible verification test scenarios), a sample from the popu-

lation is typically the best approach available. For example, the mean time between

failures (MTBF) in 10 months of run time is a “statistics,” whereas the MTBF mean

over the software life cycle is a parameter. Population parameters rarely are known

and usually are estimated by statistics computed using samples. Certain statistical

requirements are, however, necessary to estimate the population parameters using

computed statistics. Table 6.3 shows examples of selected parameters and statistics.

6.3.1

Descriptive Statistics

One important use of statistics is to summarize a collection of data in a clear and un-

derstandable way. Data can be summarized numerically and graphically. In numerical

approach, a set of descriptive statistics are computed using a set of formulas. These

statistics convey information about the data's central tendency measures (mean, me-

dian, and mode) and dispersion measures (range, interquartiles, variance, and standard

deviation). Using the descriptive statistics, data central and dispersion tendencies are

represented graphically (such as dot plots, histograms, probability density functions,

steam and leaf, and box plot).

For example, a sample of an operating system CPU usage (in %) is depicted in

Table 6.4 for some time. The changing usage reflects the variability of this variable

that typically is caused by elements of randomness in current running processes,

services, and background code of the operating system performance.

The graphical representations of usage as an output help to understand the distribu-

tion and the behavior of such a variable. For example, a histogram representation can

be established by drawing the intervals of data points versus each interval's frequency

Software Design for Six-Sigma: A Roadmap for Excellence

Search WWH ::

Custom Search

Home