Information Technology Reference
In-Depth Information
eliminates defective samples and standardizes the data. This phase is normally di-
vided into 3 sub-phases: background correction, standardization, and summarization.
There is currently a limited group of algorithms that investigators use for performing
these steps. The most common are MAS5.0 [30] (Microarray Affymetrix Suite 5.0),
PLIER [31] (Probe Logarithmic Intensity Error), and RMA) [32] (Robust Multi-array
Average).
The RMA [32] algorithm is method for normalizing and summarizing probe-level
intensity measurements. It analyzes the values for the PM (Perfect-Match): in the first
step, a Background Correction is carried out to remove the noise from the averages of
the PM; in the second step, the data is quantile normalized in order to compare data
from different microarrays; finally, a summarization is made and the values for each
probe-set are generated.
4.1.2 Irrelevant Probes
Once the control and the erroneous probes have been eliminated, the filtering process
begins. The first step consists of eliminating the probes marked as irrelevant in previ-
ous executions of the CBR cycle. This way, all probes that can pass the filtering
phase, but are prone to cause erroneous results during the reuse phase, are removed.
4.1.3 Variability
The second stage is to remove the probes that have low variability. This work is car-
ried out according to the following steps:
1. Calculate the standard deviation for each of the probes j
1
n
(
)
(1)
=
2
σ
=
+
μ
x
·
j
·
j
ij
n 1
j
μ
Where n is the total number of cases,
is the average population for the variable j,
and i x is the value of the probe j for the individual i.
2. Standardize the above values
·
j
σ
μ
·
j
z
=
(2)
i
σ
1
n
1
n
(
)
Where
=
=
2
μ
=
σ
σ
=
+
μ
x
z i
N
(
0
and
where
·
j
·
j
·
j
ij
n 1
n 1
j
j
3. Discard probes for which the value of z meets the following condition:
0
z . This will achieve the removal of about 16% of the probes if the
variable follows a normal distribution.
<
1
.
4.1.4 Uniform Distribution
Finally, all remaining variables that follow a uniform distribution are eliminated. The
variables that follow a uniform distribution will not allow the separation of individu-
als. Therefore, the variables that do not follow this distribution will be really useful
 
Search WWH ::




Custom Search