Biomedical Engineering Reference
In-Depth Information
a static property of the scale. For example, length might be measured with a granularity of either
nanometers or millimeters. Accuracy is a measure of how close measurements come to actual values,
and precision is a measure of the repeatability of the measurements.
The most common scales used in the normalization process are listed in Table 7-2 . Absolute scales
are based on quantities, such as the number of amino acids in a protein. Nominal scales are based on
unique identifiers, such as names and descriptions. Categorical scales assign data to numerical or
textual categories. Ordinal scales put things in order, according to some organizational theme. For
example, proteins can be ordered according to molecular weight. Rank scales are like ordinal scales
with the addition of a natural ranking, such as "more stable" and "less stable" protein configurations.
Interval scales have a natural ordering, such as time. Ratio scales are expressed as a multiple or a
fraction of a unit or interval, such as micrometers and milligrams.
Table 7-2. Scales Used in Normalization.
Scale
Example
Absolute
Count (3 amino acids)
Nominal
List of Protein Names (Lysine, Arginine, Tyrosine)
Ordinal
Process Phase (first, second, third)
Categorical
Types of Amino Acids (essential, non-essential)
Rank
Protein folding (primary, secondary, tertiary)
Interval
Time (seconds)
Ratio
Weight (micrograms)
With the exception of absolute scales, these scales can be converted to another scale if they are the
same type and measure the same attribute. When data are defined with the same scale, the
normalization process depends on the type of data. For example, nominal scales are converted to
other nominal scales by a mapping function. However, mapping can introduce errors when there is a
one-to-many mapping or many-to-one mapping between the two nominal scales. For example, the
name of an amino acid can be mapped to a triplet of base pairs, but if there are multiple possible
base pairs that code for a given amino acid, then the alternative base pair sequences are lost in the
translation.
Both ordinal and rank order scales are translated by a function that maintains their relative order. As
in the mapping of nominal scales, errors of omission are introduced by the conversion process when
there isn't a one-to-one mapping between the two scales. Interval scales are converted to other
interval scales through linear functions that preserve the ordering but shift the relative values, as in
the conversion of degrees Fahrenheit to degrees Celsius. Ratio scales are converted to another ratio
scale by a constant multiplier. For example, a ratio scale of 0 to 2 meters could be multiplied by a
factor of 100 to provide a scale of 0 to 200 centimeters.
The units used in the process of normalization may be primary, such as seconds of time or
micrograms of mass, or derived, such as density (grams per cubic centimeter) or volume (cubic
millimeters). The standard Systeme International (SI) measurement units for primary units include
meter for length, kilogram for mass, second for time, ampere for electrical current, degree Kelvin for
temperature, and the mole for molecules.
The final preprocessing and cleaning activity, missing-value analysis, involves detecting,
characterizing, and dealing with missing data values. One way of dealing with missing data values is
to substitute the mean, mode, or median value of the relevant data that are available.
Search WWH ::




Custom Search