Databases Reference
In-Depth Information
3 Experiment and Results
3.1 Datasets
Four widely used publicly available gene expression datasets were used in our
experimental evaluation of the proposed method. They were obtained from
Kent Ridge Biomedical Data Set Repository which was described in [20].
Leukemia1 Dataset (amlall)
The original data comes from the research on acute leukemia by Golub
et al. [21]. Dataset consists of 38 bone marrow samples from which 27 be-
long to acute lymphoblastic leukemia (ALL) and 11 to acute myeloid leukemia
(AML). Each sample consists of probes for 6,817 human genes. Golub used this
dataset for training. Another 34 samples of testing data were used consisting of
20 ALL and 14 AML samples. Because we used leave-one-out cross-validation,
we were able to make tests on all samples together (72).
Breast Cancer Dataset (Breast)
This dataset was published in [22] and consists of extremely large number of
scanned gene expressions. It includes data on 24,481 genes for 78 patients,
34 of which are from patients who had developed distance metastases within
5 years, the rest 44 samples are from patients who remained healthy from the
disease after their initial diagnosis for interval of at least 5 years.
Lung Cancer Dataset (Lung)
Lung cancer dataset includes the largest number of samples in our experi-
ment. It includes 12,533 gene expression measurements for each of 181 tis-
sue samples. The initial research was done by Gordon et al. [23] where they
try to classify malignant pleural mesothelioma (MPM) and adenocarcinoma
(ADCA) of the lung.
Leukemia2 Dataset (mll)
This Leukemia dataset tries to discern between three types of leukemia (ALL,
MLL, AML). Dataset contains 72 patient samples, each of them containing
12,582 gene expression measurements. Data was collected by Armstrong et al.
and results published in [24].
Search WWH ::




Custom Search