Information Technology Reference
In-Depth Information
in a more accurate classification, since the results suggested by the ESOINN network
are compared to those obtained using the PAM technique. The revise phase initiates a
RIPPER [43] algorithm for extracting knowledge about the classification process.
Moreover, the revise stage includes a MDS (Multidimensional Scaling) technique
[18] [19] [20] for presenting information in low dimensionality. Additionally, a hu-
man expert analyzes this information and evaluates the proposed classification as well
as the validity of the rules generated. Finally, in the retain stage, if the human expert
considers the proposed solution valid, the system stores the case information and the
rules that have been obtained.
The chapter is structured as follows: the next section briefly introduces the problem
that motivates this research. Section 3 presents the approach presented in this work and
describes the novel strategies incorporated in the stages of the CBR cycle. Section 4
details the innovative computational intelligence techniques presented in this work.
Section 5 describes a case study specifically developed to evaluate the CBR system
presented within this study, consisting of a classification of CLL leukemia patients.
Finally, Section 6 presents the results and conclusions obtained after testing the model.
2 Related Work
Microarray has become an essential tool in genomic research, making it possible to
investigate global gene expression in all aspects of human disease [21]. Microarray
technology is based on a database of gene fragments called ESTs (Expressed Se-
quence Tags), which are used to measure target abundance using the scanned fluores-
cence intensities from tagged molecules hybridized to ESTs [22]. Specifically, the HG
U133 plus 2.0 [5] are chips used for expression analysis. These chips analyze the
expression level of over 47.000 transcripts and variants, including 38.500 well-
characterized human genes. It is comprised of more than 54.000 probe sets and
1.300.000 distinct oligonucleotide features. The HG U133 plus 2.0 provides multiple,
independent measurements for each transcript. The use of Multiple probes provides a
complete data set with accurate, reliable, reproducible results from every experiment.
Microarray technology is a critical element for genomic analysis and allows an in-
depth study of molecular characterization of RNA expression, genomic changes,
epigenetic modifications or protein/DNA unions.
Expression arrays [5] are a type of microarrays that have been used in different ap-
proaches to identify the genes that characterize certain diseases [23] [24] [25]. In all
cases, the data analysis process is essentially composed of three stages: normalization
and filtering; clustering; and classification. The first step is critical to achieve both a
good normalization of data and an initial filtering to reduce the dimensionality of the
data set with which to work [26]. Since the problem at hand is working with high-
dimensional arrays, it is important to have a good pre-processing technique that can
facilitate automatic decision-making about the variables that will be vital for the clas-
sification process. In light of these decisions it will be possible to reduce the original
dataset. Moreover, the choice of a clustering technique allows data to be grouped
according to certain variables that dominate the behaviour of the group. After organiz-
ing into groups it is possible to extract knowledge and classify patients within the
group which presents the most similarities.
Search WWH ::




Custom Search