Biology Reference
In-Depth Information
2. After the use of the image analysis software, the matching and
boundaries of all the spots should be manually checked to
ensure correctness. The definition of a spot (intensity changes
over the background) is flexible, but it should be maintained
across all the experimental gels.
3. Normalize spot abundance (SA) values according to the total
spot abundance in each gel.
()
()
SA
(
NSA
)
=
k
k
1
N
SA
i
The Normalized Spot Abundance (NSA) from spot k of gel i is
obtained after the division of the spot abundance by the sum of SA
of all N spots present in gel i . This procedure requires a complete
match of the gels ( see Note ).
3.3 Missing Value
Estimation
Proteomic studies, especially those based on 2-DE, tend to show
missing values. For estimating the missing values, one of the better
approaches is the use of an imputation algorithm such as k -nearest
neighbors. Briefly, this method consists of finding the k nearest
neighbors to the variable with missing data, and fills the gaps by
taking the mean of the missing variable among the k neighbors. It
is not advisable to use this method for the imputation in more than
20 % of missing values (i.e. one missing value in a set of five repli-
cates). It is also recommended to use the whole dataset, including
all the treatments, to perform the missing value imputation. In R
there are some packages with this algorithm like {impute}, which
also provides SVD and SVT (Singular Value Decomposition (SVD)
and Singular Value Thresholding (SVT)), two imputation algo-
rithms and tools for determining the optimal k .
It is advisable to expend some time in analyzing the data quality,
finding and filtering (or reanalyzing) the outlier values. This will
allow the removal of artifacts, technical biases, and it will also
increase the statistical power of the subsequent analyses.
In most of the cases, the removal of outliers or single hits
(proteins/spots that can be only detected in one sample) reduces
the noise of the sample, allowing better understandings and more
complete analyses. A good practice can be the definition of consis-
tency rules for further consideration of spots/proteins (variables
from here on). For considering a particular variable, it has to be
present in at least 75 % of the replicates of at least one treatment.
This will discard mostly noise-artifacts and proteins/peptides near
to the detection limit. Methods to determine outlier samples are
described below.
In this sense variables with lower values usually show the higher
coefficient of variation (CV) due to the presence of outliers. It
3.4
Data Filtering
Search WWH ::




Custom Search